Martin: Good morning. So, if I introduce myself first then introduce Neil... So, I'm Martin Geddes. I run a consulting company focused on business model innovation and disruptive technologies in telecom. And we're not only breaking the paradigm of networking this morning, we're breaking the paradigm of keynotes, by doing it as an interview.
I met Neil around four years ago, and at first I thought Neil had a really neat new form of quality of service management, a very clever little algorithm. And over time I've realized that what he's got is something much bigger and much more profound. It's actually a paradigm shift in how we understand networks as systems.
And it's more subtle and quite hard to get one's head around; it's taken me a few years. So we're going to attempt to communicate the essence of that in the next twenty, twenty-five minutes and leave some space for Q&A. It's a little bit like the leap from understanding the world through Newtonian mechanics to quantum mechanics. You have to let go of some assumptions about what the world looks like, and that's quite difficult.
So we're going to challenge some of your thinking. Let's kick off, and let Neil introduce himself in a moment. Neil, if you'd just start with the punch line, okay? A really provocative title for this talk is "The Internet's not a Pipe and Bandwidth is Bad."
Neil: It was not quite as provocative as that, but Lee decided to leave a few words out of the title to make it even more provocative. Originally it was "The Internet Is Not Pipes and Bandwidth Thinking Is Bad". Why is it not pipes? If you're a telco, if you've been brought up in this space, there's a 150 years thinking about pipes and circuits and all the rest of it. And every time an innovation has come along in the whole telco industry, everybody comes back and tries to turn everything into pipes.
And that's fundamentally wrong. It's fundamentally the wrong way to deal with what is an underlying statistically multiplexed medium, which has different properties than just delivering quantity.
So why is bandwidth bad? Well, the bandwidth-centric thinking basically turns around and creates engineers who think "I'll allocate enough bandwidth to make this work, and then double it," and basically you'll find that in most telcos, all the people we've seen and worked with, their infrastructures are between two and ten times the size that they actually need to be for the traffic they're designing them for.
So this cost is horrendous on the whole industry and it's basically weighing things down. It's pushing the growth out. It's squeezing the life out of the growth of the industry.
Martin: So you developed an alternative model of understanding that complements in some ways, or supersedes, the bandwidth model of thinking?
Martin: Give the audience a quick introduction to who you are and your background?
Neil: Okay, so who am I? I say a founder of Predictable Network Solutions, worked in a university for twenty years. I got fundamentally interested in why every time there's a sort of quality of service mentioned, it was always for future study. I looked at why the existing approaches just didn't seem to work in what I knew networks were experiencing, which is overload.
So I started looking at the issues, what it means to reason about applications, quality of experience, network performance - all in some sort of rigorous framework, so that you can reason about performance before you actually build the systems.
Martin: You're somewhere between a mathematician, computer scientist, and engineer who's been studying systems and saturation for a number of decades?
Martin: OK. What's the essence of what you've discovered about systems and saturation that's special, or interesting, that people need to know about?
Neil: The really important thing is there is no such thing as quality. I cannot turn around and take some quality and stick it into a network, just like I couldn't turn around and bring a box of dark, and open up the dark in this room. Delivering quality is about not having quality attenuation, it's about having no more loss and delay in the end-to-end streams than your applications can tolerate and still work.
Martin: You used the phrase "quality attenuation." It's a fundamental concept that maybe is the thing we start working with beyond bandwidth. What is quality attenuation?
Neil: Well, the best analogy here is if you turn around and imagine two things talking to each other, and as they get further and further away, the delay gets worse and the probability of loss gets higher - that's quality attenuation. And what's really interesting here is that quality attenuation, and this is the mathematician in me, is a conserved quantity. You can't un-delay things. You can't un-lose them. You can retransmit them. You don't lose the information but those packets are lost.
Then you start thinking "wait a minute, what does it mean for my application to work?" It doesn't care whether it's wireless, wired, super fast carrier pigeons, as long as the loss and delay characteristics of the data transport stream it gets are sufficiently good for the application to work. Notice, I didn't say bandwidth.
Martin: Today people think about things like bandwidth and jitter. Those in some ways are second order measures. The fundamental measures are capacity of a transmission system and then the loss and delay that's occurring within it, and the load that's put on top of it. Yes?
Neil: Yes, so basically what you're seeing here is there's this idea that quality attenuation occurs, and you can measure it, and understand it for a per application basis. You can also say, well where I have a point of congestion, a multiplexing point or something similar. The total amount of loss and delay that occurs at that point must also be conserved.
So suddenly you realize the fundamental limits of how much you can do. You can also rob Peter to pay Paul- so I can trade and give one person better delay (lower quality attenuation) by stealing "quality" from another, or by losing somebody else's packets. You can't actually destroy that quality attenuation.
Martin: So it's a bit like the conservation of energy law, the way networks work today, is that every time you try to prioritize something we're actually creating a negative sum overall. We're reducing the amount of capacity of the network. What you've been working on is how to do the trading. So how does that work?
Neil: Stochastic process algebras and things like that, but at a level you can build very simple programmable things (automata) that work on a per packet basis, whose emergent (operational) properties are such that quality is assured (quality attenuation is within required bounds) under overload. (No control loops necessary).
Martin: So the alternative way of thinking about networks, instead of about bandwidth and jitter, is thinking about loss and delay, and how we manage loss and delay across the network.
Neil: Yes, so then you end up with budgets, so you can now turn around and say, "My VoIP will work if I deliver 97.5% of the packets within this much delay within a certain jitter tolerance." And then you can start saying, "Well, if I use this particular transmission medium, like 802.11, this is the cost of transmitting VoIP packets in terms of delay and loss characteristics (the amount of quality attenuation)", similarly for wired, and all the rest of it (the whole end-to-end path).
And you can start to make rational choices (by rational choices, I mean you can sit down, put it in a spreadsheet, do the analysis, and compare the numbers at the end) as to implementation strategies and cost of delivery. You can start removing a lot of that hand waving and risk out of the business process.
Martin: The starting point of what we can talk about is that you can take loss and delay on a network, allocate it across the different streams that are going over the network, and do it in a way that is what we might call efficient?
Neil: To the extent that it is not unreasonable for your key links to be running at 99.99% continuously.
Martin: Which contrasts today with how networks are run in practice, which is whenever you put any latency sensitive traffic in there, you have to leave them mostly idle.
Neil: Yes, extremely idle.
Martin: You talked a little bit earlier about pushing systems into saturation. The problems only occur when you actually load up a network, if it's not fully loaded then loss and delay doesn't become such an issue. But haven't we solved these problems with things like TCP? Doesn't TCP already solve this for us? It backs off when the network gets full.
Neil: So TCP is a universally greedy algorithm whose sole purpose in life is to push the network into saturation. TCP itself will find the weakest point and load it. Typically people might be running 50 thousand, 100 thousand TCP sessions if you've got BitTorrent (at a contention point inside the network). I know the buffer bloat people will say they can solve it and all the rest, but it doesn't have to be that way.
Martin: In some ways we've layered on protocol on top of existing IP infrastructure that take us to the most chaotic and least predictable behavior of network.
Neil: Yes, it's a self-constructed problem.
Martin: Isn't there a second thing with TCP, ultimately you're trying to manage your control loop here with TCP? What're the problems behind that?
Neil: Basic control theory- in the sense that the response time of the control loop is way, way too slow for the timescale over which effects occur. You can get into trouble exponentially fast. You can get into trouble in 10 milliseconds and it might take you five round trip times, which might be 250-300 milliseconds, for TCP to respond.
Martin: So in some ways we can't solve the problems of networking by building better TCP-like algorithms.
Neil: Correct, it's fundamentally not possible.
Martin: So we have to approach the problem from some different direction?
Martin: When we think of networks as pipes, what does that really mean?
Neil: Okay, so people think of pipes, they think of water, they think, "I can take half this water away and still have water left over." And they make the mistake that it's not really like water. My best analogy I've come up with so far is cow juice. You might know cow juice as milk, but we're going to call it cow juice for a minute.
When you take this cow juice, you've got a choice. You can take a certain amount of it and turn it into cheese, but if you do that you've got whey left over. And the point is people want double cream, low-fat milk, they want butter out of this cow juice- those are the good-quality services- but actually those, by their very nature, leave low quality capacity around- the whey in the process.
Currently the way that people give you quality is to make sure the cream is all you see and the rest is thrown away, i.e. by keeping the load low that capacity is never used; whereas actually in the dairy industry, I think I looked on Wikipedia, there were 375 ways in which the juice of a cow could be turned into a product.
Martin: On telecom's networks, we're trying to just produce cream all the time and I think the cost of producing cream is we end up throwing away most of the product that's capable of traveling.
Neil: There's a historical reason for this because basically telcos always produced things that had the best possible jitter and the lowest possible loss rates, so they're always trying to produce double cream. And they couldn't see why they couldn't have the double cream forever (like they had been able to with physical circuits). Even the people in the industry don't quite understand that when they went statistically multiplexed, they let this genie out of the bottle, and they still think they've got the double cream everywhere.
Martin: Isn't there another issue we're thinking about pipes? We tend to think of pipes as pairing physical objects, but physical objects don't work in the same way as packets.
Neil: Right, the other thing that people... Water pipes have feedback mechanisms in them naturally through the fluid; that doesn't exist. Water is a homogenous thing, it doesn't care which water molecule comes out at the end of the pipe when you put it in, but networks are nowhere like that. If you start delivering packets randomly to endpoints they're useless.
So there're lots of problems with the analogy. But you have to start with where people are when thinking of this stuff.
Martin: We talked about the real cost of the current approach being that we end up throwing away most of the potential carrying capacity of our networks. You've actually been doing some measurements in this very hotel, as to how the wireless network of the hotel works. What did you observe?
Neil: Yesterday morning at half-past five, you know that feeling -- waking up because of jet-lag. I measured what a 64Kbit stream would get from here to the UK. And I've actually got traces of packets taking 30 seconds to go from here to London, but only if they were of a particular size. So there's some artifact in the end-to-end, in the network, that's causing an issue and you would perceive that as VoIP not working.
Even in this room where we have 20MB with a microwave dish backhaul, you can actually see 300 millisecond spikes occurring because of some other artifact in the way that it's delivered. We've helped several people understand how to improve their networks by just viewing them this way.
Martin: The way the network works today is it's highly unpredictable.
Neil: Yes. Basically it's (the industry viewpoint) "You get what you get. If you have a problem the answer is to put more (capacity) in (the network)."
Martin: If you'll talk a little bit on the techniques, the math that you've developed that lets you run networks much much hotter. Have any of these techniques been used in anger and where?
Neil: Yes they have. Basically we live on our own dog food; we're a completely virtual organization. So we've implemented this at the edges of other peoples' networks, which we use, like BT in the U.K., and Comcast in the U.S. We can take a standard retail service- once we've characterized it, once we've worked out where the edges of predictable operation are (there are certain edge cases you mustn't drive into) - and can turn around and run video conferencing, along with a femtocell, along with remote terminal sessions, for no additional cost onto the retail service itself.
Martin: Let's elaborate on that a little bit and develop that. There're a number of different flows of traffic going over the network: voice, video, maybe BitTorrents, maybe scavenger traffic like a backup of a PC. What you're doing is being able to allocate that loss and delay predictably across the different streams, whilst also loading the network up to 100% of its load.
Neil: Yes, and predictability says "everybody can't win", but you can reason in the math about how things fail. If you've got to drive something to saturation you will have to have large variability... you cannot put a quart into a pint pot. But, that doesn't matter as long as you understand, pre-plan, and manage how it will "fail".
Martin: Isn't trying to get the network to control what kinds of traffic get what kinds of loss and delay characteristics- isn't this an expensive control layer you're having to build on top of the internet? Why not just throw more capacity at it?
Neil: It turns out, once you understand the math... Ok, a bit of history, we went and built this originally so that we could do British sign language over video. Our friends from Skype, our platinum sponsors from Skype, were talking about video calling and all the rest of it. But I would say that's video-enhanced voice - the video doesn't carry the information content - there's no (guaranteed) lip sync, and you can't really sign over it because its rate (and hence its visual quality) changes.
So what we actually did, back in 2006, was build all these quality control mechanisms along with some forward error correction, wrapped the U.K's national network so that over 256K/512K line, you could actually hold perfectly good sign language conversations, with a (residual) error rate of a visual glitch about once every twenty minutes.
And that was one of the services amongst all of the other things that you could be doing, there may be times in which your back-ups might be trickling through at 1 bit per second or 100 bits per second, but that was enough to keep them alive, so that when you put the call down, everything started back up again.
Martin: You can get the kind of quality the PSTN might have delivered but without the cost of having to exclude all applications apart from telephony?
Neil: Where I live it's better than the PSTN.
Martin: Just talk technical for a moment. Without using the words stochastic network algebras, what does your technology actually physically do and where does it do it in the network?
Neil: What does it physically do? It physically shapes the traffic that's coming through at a point. Let's take an example of the last mile, or the head end for a cable operator, or the point at which you hand wholesale data off to an ADSL carrier. The data arrives at that point, and we know that we have some basic topological information about the final line speeds, the transmission link capacity. And we can actually shape the traffic so that as it leaves the box -- I'll tell you what the box is in a minute -- the packets are spaced such that they will never contend with each other for resources downstream. So what you've done is effectively eliminated the need for buffering in the whole of the access network. That's your first point.
Martin: That's an answer to buffer bloat.
Neil: That's an answer to buffer bloat. It's an answer with simpler edge equipment. It's a massive cost reduction in those things in the long term. On the upstream, you have a similar problem because you've got a large amount of data -- a gigabit home network going into a megabit up, if you're lucky -- we've done the same type of approach but this is just a firmware upgrade on a (retail) router. On top of that you layer the management, but the management approach is such that it is tailor-able, it's bespoke. The individual customer can define what they'd like to be their priorities, the order in which things "fail", which services they require (to be assured), how the "quality" is managed. How their finite quality attenuation is spread amongst their applications.
Martin: Many of the people in this audience are in the voice industry so what're the consequences for voice? Let's maybe take the home user first.
Neil: You can take a very, very mediocre DSL line, and run 4, 5 PSTN voice quality calls over it. I live at the edge of the Somerset levels, 5.6 kilometers from my exchange. My plain old telephone service is pretty awful. We basically run everything over VoIP and nobody knows -- it's actually better than our PSTN. It does that, but also you've got an infrastructure to build upon, say for people who are on mobiles, this network has now got the right characteristics for you to have a femtocell. So, actually I've got full 3G coverage, as have my neighbors for about 200 meters around me, all from a single ADSL link.
Martin: You can provide quality assured backhaul for femtocell...?
Neil: ...as well as voice...
Martin: ...that doesn't fall over when the kid upstairs turns their Xbox on?
Martin: What about enterprise, why would enterprise care about this technology?
Neil: Home working. A report came out yesterday, now it's worth three thousand pounds per person (per year) to keep them at home a day a week in the U.K. because of things they don't do. I know of UK government authorities trying to meet their carbon footprint requirements and they're trying to make people work from home more. Currently the only route they've got and that they're thinking about is having to put second telephone lines in and then run ADSL over them as well.
That ceases to be a problem. You can take the same individual ADSL line and run separate quality assured services, so that you know the basic requirement for the business is going to be met, yet still have flexibility between them (the business and domestic traffic).
Martin: Unlike today where you could run prioritization on the network, and have the home worker doing their call center application or whatever from home. It doesn't come with the cost of having to basically eliminate everything else that's going on, on the network.
Martin: What about someone in an enterprise where I'm running SIP trunking? What would your technology do for me?
Neil: We do it ourselves. Basically we have SIP trunks wherever you want them to go to, and then terminate them. Once terminated the sessions are delivered to you. The (media) quality's assured, the signaling is assured.
Martin: Rather than to buy separate circuits to run your voice over...?
Neil: Absolutely, yes, basically you have a single telephone line. It can actually carry five concurrent phone calls and it just works.
Martin: And a mobile operator, you've been working with one mobile operator. What's the impact you've had?
Neil: When we started doing some of this analysis, we helped one mobile operator in the U.K. increase -- the day that it launched the iPhone (by working all the week before that) -- its headline throughput speed by 47%. Then some subsequent recommendations we made increased it by another 100%. We hadn't even got to the stage yet where we needed to start using the technology. What we did was understand how the loss and delay accrued, and rationalized it with them.
Martin: You've used your mathematical techniques to more than double the effective capacity of one of the U.K.'s mobile network's data?
Martin: And if you're a carrier and you're thinking about things like the PSTN, what's the impact?
Neil: There's this growth (of data). Everybody talks about the 40% increase per year -- there's this power law of growth in demand. Because of the current over-engineering by high factors, that power law growth means that, if you keep the same approach (having ten times more equipment than is used), costs will spiral. Of course, you don't realize it's ten times more equipment than that which is truly required, as these are your planning rules and all you are doing is following them.
But actually the planning rules have created a very suboptimal solution. You're starting to have to invest vast amounts of money to try and track this growth in demand, but by ten times as much. There're costs, even just in the electricity and the interfaces. Even if you say bandwidth is free, the fiber is free, there's just the electricity you have to put in to run the boxes and that becomes a massive cost.
Martin: There's a cost of a parallel voice infrastructure and even if you want to just go plug someone across, if bandwidth is free, there's a cost in plugging them in.
Neil: Some of this stuff, the cost of moving one of these people (physical rewiring, etc.)-- better be a bit careful here -- we're talking hundreds of millions of pounds across a network just to be able to use the infrastructure created to "keep up with demand", which generates absolutely no more revenue. It's almost like the explosion of requirements for telephone systems when free local calls and dial-up internet first happened. It's going to happen again unless you start solving the problem.
Martin: So you essentially collapse all the PSTN infrastructure into a single infrastructure that carries voice video data all together.
Neil: Yes, and this is interesting, because if you look at the way people do this, they tend to build separate networks for separate services, which is exactly the wrong way round. If you take the bull by the horns, and realize that you can split the qualities apart, you can start thinking in terms of just three basic quality blocks; one of which is an assured set of services, one is your standard internet (html, etc.), and one is an economy service.
You can actually start constructing business models where your economy traffic is stuff that has really high jitter in it, it has a potentially highly-variable throughput, but it's excellent for pulling down that video you want to watch tomorrow. This is because its fundamental cost of delivery is a tenth or twentieth of delivering the assured services. As a telco, as an internet provider, you can start to raise the average utilization of your infrastructure, and it's that which is the first order driver for your value and your RoI.
Martin: Let's summarize; you can run your networks very hot, nearly 100% capacity. You can mix together assured traffic from voice video data, and you can also enable new revenue models by being able to offer assurances to maybe third parties like mobile operators who want to run femtocells.
Neil: Yes, and of course I've got the two tail business models which give you flexibility in who pays for it.
Martin: We'll stop there, there's a few minutes left for questions.
Lee: We'll begin with Brough here.
Brough: Hello, I'd love to know the typical thing with any quality of service, any assurances for particular streams is how you signal to the box that's providing the intervention. I see two things that you haven't touched; one is how does your box determine the loss and delay on a particular path, by talking to other boxes or whatever? You didn't describe that. And the second is how does it get information about which flows? Is there a bunch of signaling and so forth?
Neil: You can actually measure the baseline existing loss and delay that's there. In terms of knowing what it will be (when load is applied), the mathematics is good enough to predict it to two and a half decimal places. You know (under all loads) you can use the math to configure the box, and always know how the application will behave under all conditions, which is an interesting property.
The other point was how do you signal it. For things like assured data streams, they need to be constrained. You can't give an infinite amount of high quality, so you can tie this stuff into SIP signaling or something equivalent. For the other things, you've got complete flexibility to classify this on the usual sort of 5-tuple (using the packet header), and that's configurable on a per end-user basis.
Martin: I can add something extra to Brough's questions. These three layers, this is a new way of implementing policy. But that is a way of signaling policy and there're all kinds of existing techniques for doing that. On top of that, there is social and public policy so the whole network neutrality debate is impacted by this technology too.
Neil: Not necessarily by the technology, by the thinking process, which actually says there is no other transport system that we know of that makes reasonable money, in which there is only a single class of service. FedEx doesn't ship everything overnight. If you carry tomatoes in a refrigerated vehicle, you don't put aggregate in the same vehicles and have chilled aggregate delivered overnight because there's different requirements for it. Once you start taking that on, you can start making much higher gain -- the economies of scale start to cut in.
Lee: Is it Dean I saw?
Dean: I've a single class of service of water piped to my house. I disagree with quite a bit of this, I have to say. In particular, I'm worrying that we are talking about latency sensitive and delay sensitive applications as if they are the be-all and end-all, where in fact I think they're a diminishingly important part of the overall traffic mix. Or those delays and latencies can be mitigated in software. Why should we reorganize our entire network around what's essentially a minority sport?
Neil: Two things, you have a pipe to your house but it's not water. Physics says it's not water. It has these properties. You can claim it to be water but I'm afraid what you get is a variable amount of cream in your system.
Secondly, it's interesting; you need to travel faster than light to get round delay. If you can do that, I'm interested. Software can't get around latencies. There are a whole bunch of latency sensitive issues. You're just saying gaming is a completely irrelevant part of the universe, yes? I can't agree with that. I'm sorry, I can't agree with your premises even in that sense.
I'm not saying you have to change the whole world. The really good stuff about this is vast portions of the network have such good quality characteristics you don't need to touch them at all. You only need to put this in certain locations having identified them.
Lee: I don't know the next gentleman in front of Rich.
Kevin: A couple of things confuse me about this. You say you space out the packets to avoid buffering. That seems like a contradiction in terms. You become the one true buffer in a sense, is that what you mean?
Neil: Yes, but actually I move all of the downstream contention into the single point. I haven't made the system any worse, but I have put it under a single point of control.
Kevin: How does that compare with UTP protocol, which is designed to work from the edge and sense the places where congestion is occurring and back off rapidly?
Neil: That was the TCP issue or an elastic stream, whether it's TCP, the RX protocol from AFS, or even some of the ones the BitTorrent people have done.
Kevin: The BitTorrent one is the one I mean.
Neil: They are all pleading to exploiting control-loop theory. What I'm saying is the time constant that the effect occurs over is too fast for the time constant of the controller to manage it. If you get what I'm saying, it fundamentally can't work from control theory 101.
Kevin: But you can somehow do this from a single point?
Neil: What you're doing is you deliver a consistent quality to the thing, even as the load increases, you can actually deliver a consistent quality attenuation to it. So one of the things that we've actually done is for a streaming iPlayer video, you can construct just the right characteristics that cause TCP to work perfectly. It never loses something, the window sizes work perfectly, and the customer gets a perfect stream all the time.
Kevin: So you solved the early drop problem effectively, is that what you're saying?
Neil: For specific applications, and specific places, because we're topologically aware -- if you know what the network behaves like, so TCP is great because it works without knowing anything about the system. What I'm saying is we're using knowledge of what is there, along with the maths, to turn around and give a better service.
Lee: We have to quickly do two more questions and because we're beginning to over time, can we make then very quick? Richard please.
Richard: I didn't get the very first part of your talk, but from our conversations on it and some of the questions, it seems like one of the really key points that some people aren't getting, and this is not really unique in the course of the kind of conversations we've had about network policy in the U.S. for the last five years; if you want to build a single network that supports multiple applications, doesn't it necessarily follow that the network has to provide a level of service to each of those applications that's consistent with its needs, number one. Number two, where does the idea come from that real-time communications are diminishing? Everything that I see, video conference, video Skype, you're telling me that real-time communications are diminishing?
Neil: I'm not.
Richard: In quantity of packets on a network.
Lee: Let's deal with that first question as quickly as possible.
Neil: The interesting thing is this provides you with a mathematical framework. Actually you turn around and actually say to the supplier, "I want to have a contract with you to supply the traffic to me in this way," and it's an instantaneous quality measure. There was some famous thing many years ago when I asked one of the U.K. telcos, "How do you know that you deliver 99.9% of the packets?" "Oh, we measure it once a year."
What we're saying is when I say you're being delivered 99% of the packets, what I'm actually saying is the probability of loss of any one packet is less than 1%, and that's a much stronger measure.
Lee: Last question, but let's make it very quick, please.
Audience: Can you talk about how the upstream traffic's going to work, what's required at the leaf-node when something needs to be sent back into the network?
Neil: It's the same sets of algorithms, configured in a similar way by the same mathematics. What we've actually done so far, as proofs of concept, and demonstrated for people is embedded this in the existing firmware of ADSL modems and off the shelf routers.
Lee: I'd like you to give a round of applause to Neil for that breakthrough.