The following interview is with Jonathan Christensen who is the general manager of audio and video at Skype. It took place last Thursday (31st January).
An audio version is likely to be made available within the next ten days or so courtesy of www.itconversations.com. I'll keep you posted.
Lee: Estonia of all places. I want to divert to travel but if I do I am scared I am going to get shifted on because I will be told in half an hour, "Okay, your time is currently done asking any questions". So, I will begin here just in case that happens. Now, can you tell me why you think eComm, the Emerging Communications conference is so important?
Jonathan: Sure so, I have been working in this space for something like ten years and I have been calling it, instead of eComm, I have been calling it rich communications for all that time. It all started for me when I was probably at a Pulver Show [circa 1998] maybe one of the very first VON Europe's or something like that and there was a bunch of people talking about SIP and, prior to that, we had all been talking about H.323 and MGCP and there was this whole revolution going on in telecom about how you break the switching architecture apart and how you use IP networks for transport.
I was working for Microsoft at that time and Microsoft had some play in that space because, I guess, people were prototyping on Microsoft operating systems and so we were getting volumes, some server licensing volume out of it or something. That is why I was tracking it and I was involved in the group that was kind of targeting that industry segment and, when the SIP thing came along, suddenly a bunch of lights went off in my head, or on, I guess, in my head.
And, SIP's initial vision, and I think still, that people who are really active in that community have this shared vision of a very generic session management protocol that could manage any kind of session. So, video, text, presence, audio, all of those things and merge them from an application perspective so that the user could chose any of modalities that they saw fit, data sharing for example, all within the context of a very light weight and web-based programming model.
So, as opposed to the old world, suddenly this was about text-based protocol that was human readable and extensible to any format and, I mean, that just seemed so incredibly powerful to me. I ran off and started making a lot of talks and recruiting people inside of Microsoft to take a look at this and eventually it culminated with the formation of a new group and we started building a client and a server and over time that evolved into what is now a very rich enterprise offering.
But nothing for the average consumer to see, and it wasn't until Skype came along and kind of closed the loop on the consumer side, in terms of functionality, that we finally saw this multi-modal experience in the wild and in consumers' hands in a way that was super usable.
I think it really represents the first major innovation in this space since the introduction of the telephone system. And right now, we are just at the beginning of where the possibilities are. And so, I think I see eComm is focused exactly on the right spot where there is going to be an explosion of new rich communication services on the Internet platform..
Lee: Thank you very much for that. Could you briefly outline the subject area that you will be keynoting at eComm?
Jonathan: Sure, so what I am planning to do is talk about kind of a brief history of VoIP, what its application were? What was driving those applications? Why were people interested in VoIP from the telecom side, from the consumer side, from all the different segments?
[I will] Talk about the emergence of Skype as the first practical mass market of a VoIP application and the first consumer business that got scale in that space and then bridge that to this area of rich communications and talk about where we're headed in terms of rich communications for the mass market.
Lee: Okay, that leads me on to ask you - you are helping to lead initiatives for voice quality at Skype - can you elaborate on what these initiatives are?
Jonathan: Sure, this is an area where I could talk for hours and hours.
There is a whole range of opportunities and also problems that come when you move from the circuit-switched world which was very deterministic but very narrow in terms of the application - like you can get voice across that channel and that's about it - and you can get it only in the standard format that was adopted a hundred years ago - so 8 kilohertz of sampling and roughly 4 kilohertz of signal and tin can sounding kind of voice but it always got there.
Suddenly with the Internet, you have the possibility to do so much more but you also loose that deterministic quality. There are so many things that we have to do to get the whole thing right and it's like a daisy chain. On the send side, you have to think about the microphone and then the sound card and the sampling and the coding. Then you have to send it correctly and you have to know something about the network when you're sending it: is there packet loss on the network? is there jitter? is there delay? What is the bit rate that is supported on the network? All those things that you have to optimize for - all of those situations. Is there jitter in the system that you're using because other applications are taking too much CPU? All those things.
On the flip side, we have to do exactly the same thing so we have to make sure that the incoming signal is being handled correctly and optimized and played out at the right rate to avoid the effects of packet loss and jitter and delay in the network.
We've got a big team of researchers and developers who are constantly working on solutions to the problems that you see in the vagueness of the network [interfaces] and [on] all of the devices that we want to support. They are doing fundamental work in all of these areas - in speech coding, in packet loss resilience and echo cancellation, all of these areas.
For example, in 2007, we introduced an entirely new echo cancellation scheme and, for the first time, you can have echo-free conversations hands-free on a regular laptop. It also works on the Mac; that is definitely a first for the Mac in terms of any useful echo cancellation. Today with Mac Book Pro, you can have an extremely reliable conference call scenario, I mean hands-free scenario, with your laptop just running Skype.
That is the kind of stuff that we are optimizing for and the kind of scenarios that we're trying to enable and doing all of this very low level work to make this stuff work. And, so far, things are going pretty well. We have a lot more work to do and there are so many more opportunities as well in the video space but this is what kind of gets us excited in the morning and working on the space.
Lee: Okay, that leads me on to ask a question about high definition voice: wideband audio, 8 kilohertz to 16 kilohertz. I see it in the medium to long term being a growth market. Can you tell us what the state of play is at Skype over codecs? More specifically, is Skype going to continue using Global IP Sound's iSAC codec? And can you just generally comment on evolution of wide band voice codecs going forwards?
Jonathan: Sure, only the old installed base clients are still using iSAC; so in 3.2 we introduced a new in-house developed wide band codec called SVOPC. If you look at the call details and the advanced options, you can see actually which codec is being used for any call; in the newer versions of the client it typically says SVOPC. SVOPC stands for Sinusoidal Voice Over Packet Codec. As I said it was in-house developed; it's a wideband 16 kilohertz codec. It is resilient to packet loss and it does an excellent job with background noise and things like that; as well, it's fully integrated with the other components on our stack, our echo canceller, our noise reduction and all those things.
And, we continue to iterate it. Since 3.5 actually, iSAC has been completely removed from the client and we are now optimizing 100% on our own internal codecs. I think, in terms of things that I see as important for voice quality in the codec space, there is a couple of different things.
One is moving yet to the next level of sample rates so going to ultra wide band. Probably not doubling the sample rate because there's sort of a diminishing return but adding a few extra kilohertz on top makes a noticeable difference and we are doing lot of experimentation with moving up the chain. We also need some cooperation from the device manufacturers because the device needs to support that [the additional bandwidth] on the sample and on the play out. But, all the good quality USB headsets, for example, support the input and the output that we need to have a noticeable difference for the users.
We also see that bandwidth extension of narrow-band calls, sort of artificially recreating part of the wide-band experience for the Skype user who might be talking to a PSTN end point, is showing a lot of promise in our labs. So we think that there's a lot of really interesting work to do in both of those areas, both expanding to wider band and recreating some of the lost signal in PSTN and mobile calls.
Lee: Okay, that nicely leads me onto another question I have, which is maybe a bit of an awkward question, so I do apologize in advance. Skype is four years old and the good point is, it was, in my opinion, the only real innovation in the telecom industry over the past ten years, aside from the switch to mobility, to mobiles. Because of the stagnancy and, in terms of telecom innovation, the desire is there to hold an eComm conference.
But, four years is a long time in Internet years and many of us feel disappointed that the changes over the past four years have been incremental. So, what I would like to know is do you see something less incremental coming along, out of Skype?
Jonathan: Yes, I can't pre-announce stuff and I definitely don't want to shout about vaporware and those kind of things, but what I would say is that, I tend to agree with you or share your frustration a little bit. There's been a lot of, like, feature optimization and not so much of the big items that would grab everybody's attention but, looking into our 12 to 24 month road map, I'm really, really excited about the things that we're doing.Lee: Okay, I'm finding myself smiling with happiness here, because I have a fair belief, that we're going to see something. I've got a million questions here and I would just like to ask more techie type question. Can you just pass a comment on the future of voice processing technologies? Where do you see them going?
And, I think that there are going to be some things in that timeframe that are going to be, while they make perfect sense for us, and they're linear on our trajectory, they are totally non-linear for the industry. I think that, when you get to see these things, you'll agree. I think that the kinds of things that we'll be working on, and then the tweaks that will be needed, the optimizations that will be needed to make them really, really work well, will carry us for the next five to ten years, it's that kind of really powerful new scenarios and innovation that we're planning.
Jonathan: Sure, so I think that the main goal that we share with others in the industry and what drives our team is this idea that we want to make distance irrelevant. You know you're sitting in Vienna, Chaim's in New York and I'm here in San Francisco and we're having a very natural discussion with wideband audio - and it's free by the way!
We want to continue to make the whole experience as seamless as possible, as natural and as life-like as possible. And I think, as I mentioned before, there'll be a trend towards the higher fidelity, better performance in the devices as well. So we need the help of the device manufacturers at this stage to realize that voice is not just about this old fashion PSTN-style voice. It's really about, high quality stuff. Video is a major initiative for us and making life-like video available in the mass market is a big goal for us as well.
And, we think - we hope anyway - that we're at the front of the pack. We're certainly investing very, very heavily in these areas. And we're hoping to make this stuff as good as it can be.
Lee: Okay, can I just jump in and ask, when you say devices, do you mean laptops or mobile phones or both?
Jonathan: Yes, all the above. It's an interesting space. In the laptop space they're so cost optimized sometimes that they really sacrifice on audio quality, even at the sound card level. So we run into sounds cards that, for example, don't support bandwidth above 16 kilohertz. We run into devices that just randomly introduce artifacts and noise into the channel. So, we really need the market to speak about that and to say, "We want better quality sound coming from the device and are willing to pay a few extra dollars for it." And on the mobile side, it's even trickier. It's a very, very complicated ecosystem.
Lee: Okay, we will leave on complicated ecosystem [laughter] - I will not ask you to expand there. Now Skype still remains an application which is tethered to the personal computer, that is, aside from the partnership with Hutchison 3. Now, when can we expect the average person to have Skype running on their mobile phone instead of needing to be tethered to a personal computer?
Jonathan: Alright, so, now you've got me back into the complicated ecosystem bit [laughter]. The difficulties here are political, religious and technical. So, I think the good news is that we have something like 200 Skype certified devices. We have devices in a whole range of different classes, from Wi-Fi phones to Skype applications running on gaming hardware and tablets like the Nokia N800 and Sony mylo; we have PC-Free phones that run on a broadband connection. All of that stuff is really interesting and starting to kind of put a wedge into the space but what we're really up against is that the mobile space is one that is particularly tricky. There's both good and bad news here; on the bad news first, it's really about the closed mindsets of the spectrum licensees.
They [mobile operators] are focused on getting their ROI [return on investment] for the spectrum and the networks that they have building out. And, what they don't get is that, at the same time, they are doing this enormous disservice to their customers and to themselves because they're not going to participate in the next round if they continue to think like this and it goes all the way through the ecosystem to the device.So, for example even devices that have rich operating systems on them, typically don't open that API for the audio part. So, getting full duplex audio on a Windows Mobile device, or a Nokia, or any of these devices, and having access to the right speaker and the microphone with the right amount of latency, and [having] the IP stack to be able to make a VoIP call, just isn't happening. So, we've looked and looked and looked and we're willing to do the hard work but we found that in many, many cases, or most cases, it's just not something that's available there.
On the good news side, the reserves [for the FCC spectrum auction], I guess, have finally been met. The early news of today [Jan. 31] is that this C block of open spectrum has pushed past the reserve mark. This means that the spectrum will be heated up and running and will provide open network access. With that there's no way that the first crack in the dam hasn't been exposed. We'll start to see some very interesting innovation at the edges.
Lee: Okay, that again puts a smile back in my face, so, finally I would like to ask you, where do you see the future of communications going?
Jonathan: Well, a big question I guess and, having worked on the space for quite a while, I think that it's only going to get more interesting over the coming years since, well, like this open spectrum for example. You know, I just have to reiterate, I think that anybody who has not figured out that the Internet is the platform and that there isn't any such thing as walled gardens that will survive, or sub-networks [such as AOL tried] that are going to survive, those people are doomed. The intersection of these worlds is going to be chaotic. It's going to be violent. It's going to be messy for a while but it is going to happen, and the Internet will survive as the one open platform. You are going to see a trend towards extreme innovation at the edges - on the devices, in the PC platform, in software, all around the edge of the Internet.
I think that you are only going to see further disruption of the telecom industry and the emergence of totally new businesses that we can't imagine today. I think that [the] net result, that drives me every day, is that we're going to have this very rich, open, cheap and accessible communications. This is going to be not just a game changer for the telecom industry, but will be a change agent for all of humanity. So, a platform that allows us all to see each other and hear each other more clearly maybe makes us a little bit less crazy, less polarized and more open as a world society.
Lee: So, I am on bonus time here and if I could just grab a couple extra minutes, I would like to ask just a few questions on the fly. Now, you seem excited about the prospect of open spectrum. Can you elaborate on these hopes?
Jonathan: Sure so, breaking through the kind of closed world of the existing ecosystem is going to, I think, for the first time allow the scenarios that we've been driving, on the PC network and in the open Internet, to the device world. I think that's important because, as you noted early on in the call, the only real segment of innovation in the last 15 to 20 years, whatever, in telecom, has been in mobility. Increasingly, people are not tied to their home phone or their desktop but they're mobile; they're moving around when they're communicating; they're taking their devices, and their scenarios, and their communications capabilities with them. To the extent that Skype and other applications are going to have an impact there, it's extremely exciting.
Lee: So, do you see Skype being a possible tenderer in the open spectrum space?
Jonathan: I don't know that I can comment. I don't know of any specific plans for us in that space. I think that it's probably more likely that we would benefit from the open networks and from the providers who are better at that kind of thing - the building out of the facilities and the networks and maintaining them and managing the subscribers and so on and so forth. But, with that open access, we would expect that there's the opportunity to put the application into that world.
Lee: I just have one last question for you and that is, at the beginning you spoke of SIP and how it changed things - your media gateway controllers and your media gateways and soft-switches and so on. But, this was kind of like the 1996 to 1999 period and Skype did not use SIP. So, do you see fragmentation when it comes to signaling going forward? Or, do you think that, just like SS7, you'll end up with this monolithic global signaling network? Or, do you just see things becoming fragmented into different signaling systems according to the applications?
Jonathan: Yes, so just one clarification - we use SIP. Where, by comparison to the other operators, we are one of the largest SIP users in the world. All of our SkypeOut minutes and SkypeIn minutes traverse the PSTN via SIP interfaces, basically. So, we use it as an interrop protocol where we need to.
I think that the vision of the early SIP founders has been largely unreal in the SIP world. SIP is typically just used for these very mundane trunking applications, like the one that we have, or sending calls between two networks and it's just calls. The vision of multi-modal communications and rich end points has largely failed within the same. I think that a big part of this is that they didn't pragmatically just solve basic problems like NAT traversal, for example. They also evolved the specification to the point that it no longer had its lightweight appeal. So, we'll see, SIP will continue to be [the] dominant protocol in terms of this sort of narrowly defined scenarios but I think that, when it comes to rich communications, you are going to see more of this fragmentation. You're going to see some islands of providers who are just solving the problems. Just making it work for the user and not being religious about the protocol for example.
Lee: Okay, I very much appreciate the time you have given us and your perspectives as well. I have found them inspirational. So, I am very much looking forward to you opening the eComm conference with your keynote.
Jonathan: I thank you for the invitation.
Lee: Thank you very much for the call. Much appreciated.