When undergoing a network assessment for a Lync or Skype for Business, it’s important to understand the behavior and attributes of the underlying RTP traffic for each profile. For reference, TechNet and the Networking guide have some excellent information, but there’s no reference to the Skype audio codec, SILK, in there. For the latest client updates, we just have to either guess or assume how this behaves. I thought I should share the results of my research.
SILK is the audio codec used by Skype since about 2009. It’s released under an open license and was used as the basis for the more recent OPUS codec (which is awesome). SILK is an advanced codec, capable of adapting to the environmental conditions its working within. It can include redundant data if the network suffers from packet loss, to try and ensure data gets through. It can also reduce its bandwidth requirements to try and alleviate any congestion it comes across, although this does have a degree of an impact on the audio quality.
SILK is used by an application in one of four modes:
- Super wideband
Which allow access to sampling frequencies of up to 8, 12, 16 or 24 kHz respectively; the larger numbers are only available in the higher modes. The mode is based on the capability of the client to support capture at these resolutions, so a Lync/S4B client should always be in Super wideband mode, unless it was on some kind of embedded device. Human speech is generally around 8 kHz, and this has been used historically, but according to Nyquist Theorem, to successfully capture 8 kHz of audio, we have to sample that audio at double the expected signal, so 16 kHz, Wideband audio is necessary to accurately reproduce the fundamental and harmonic frequencies that make up speech.
The Payload bitrate is between 6 and 20Kilobits per second (Kbps) at Narrowband, 7 and 25Kbps at medium, 8 and 30Kbps at Wideband and 12 to 40Kbps at Super wideband. As an adaptive codec, the bit rate is variable, it can change, based on the payload. If we want to send silence, then it will only be sending at the minimum rate. On top of that, SILK can use different sampling rates, based on the available bandwidth. That means we could be using somewhere between 6 and 40 Kbps. On a stable, managed network, a call will typically be in a given band for most of its duration, we would probably expect average payload rates of around 30-32Kbps, for a Super Wideband call.
Packetization is the process of ‘chopping up’ the audio signal into packets. This is done in 20ms steps, with 5 steps able to be concatenated to reduce packet overhead, at the expense of latency. So an individual packet could contain up to 100ms of audio. This is only likely if the network is low latency, but has some bandwidth constraint detected.
Algorithmic delay (i.e. the time it takes to encode/decode) is extremely low in SILK; there is a 5ms look ahead delay, but otherwise, the delay is effectively the size of the packet, so a 20ms packet (the default) has a 25ms client delay, a 100ms packet has a 105ms delay. RTA (the audio codec traditionally used in Lync) had a 10ms look ahead and a measurable amount of algorithmic delay; the 20ms packet would take around 40ms to encode/decode, so SILK appears to be around 40% faster from USB to wire. While the actual capture of an audio signal does have some overhead, it is usually measured in microseconds, so is comparatively negligible, unless the computer has a DPC, ISR or similar kernel problem.
The encoding complexity in Skype can be offset; if the device is low on CPU, then a less complex encoding scheme can be used, at the expense of bandwidth. This could be helpful on lower spec mobile handsets.
SILK can operate in a ‘DTX’ mode, discontinuous transmission mode. This effectively makes the call half duplex, slightly reducing the call quality, but reducing the bandwidth when a stream is inactive. By default, SILK doesn’t do this in Lync, but it is possible in the specification.
There is currently no information on Forward Error Correction (FEC) with SILK in the RFC (It just read TBD), but its presence in SILK, within Lync, was mentioned at TechEd 2014. For now, I have am making the same planning assumptions we do for RTA; that it can duplicate the RTP Payload by up to 100% (2X) to cope with packet loss on networks with plenty of bandwidth (i.e. Wi-Fi)
SILK features congestion control’s; the various packetization formats can be combined on the fly and the payload can vary frame by frame. By using a higher format, the packet rate is reduced, but at the expense of latency and error sensitivity. SILK generally starts with 20ms and works up if it has to.
SILK is carried in RTP, so the standard header rates apply and we can deduce those from the Lync planning/modelling guidelines.
Table 1: Audio Codec Bandwidth
Audio Payload bitrate (Kbps)
Bandwidth audio payload and IP header only (Kbps)
Bandwidth audio payload, IP header, UDP, RTP and SRTP (Kbps)
Bandwidth audio payload, IP header, UDP, RTP, SRTP and forward error correction (Kbps)
SILK Super Wideband
RT Audio Wideband
RT Audio Narrowband
Peer to Peer, Conferencing
On top of the values above, remember that RTCP for audio will add another 5Kbps to the requirement for each call.
Comparing RTA and SILK is interesting. RTA is a pretty good codec. In 2007 it was amazing, but Skype is definitely in wider use, so having SILK offers the benefit of direct integration with Skype clients to Lync. On top of that we have a codec that can provide higher quality, more options for call resilience and lower bandwidth at the same quality levels. I would say using SILK is a resounding win on all fronts for Lync and Skype for Business.