Understanding bufferbloat and the network buffer arms race

If a little salt makes food taste better, then a lot must make it taste great, right? This logic is often applied in the digital domain, too. (My pet peeve is that TV shows and DVDs keep getting darker and darker.) In a similar vein, networks used to buffer a little data, but these buffers have been getting larger and larger and are now getting so big they are actually reducing performance. Long-time technology pundit Bob Cringely even deemed the issue worthy of three of his ten predictions for the new year.

Networks need buffers to function well. Think of a network as a road system where everyone drives at the maximum speed. When the road gets full, there are only two choices: crash into other cars, or get off the road and wait until things get better. The former isn't as disastrous on a network as it would be in real life: losing packets in the middle of a communication session isn't a big deal. (Losing them at the beginning or the end of a session can lead to some user-visible delays.) But making a packet wait for a short time is usually better than "dropping" it and having to wait for a retransmission.

For this reason, routers—but also switches and even cable or ADSL modems—have buffers that cause packets that can't be transmitted immediately to be kept for a short time. Network traffic is inherently bursty, so buffers are necessary to smooth out the flow of traffic—without any buffering, it wouldn't be possible to use the available bandwidth fully. Network stacks and/or device drivers also use some buffering, so the software can generate multiple packets at once, which are then transmitted one at a time by the network hardware. Incoming packets are also buffered until the CPU has time to look at them.

So far, so good. But there's another type of buffering in the network, used in protocols such as TCP. For instance, it takes about 150 milliseconds for a packet to travel from Europe to the US west coast and back. My ADSL line can handle about a megabyte per second, which means that at any given time, 150K of data is in transit when transferring data between, say, Madrid and Los Angeles. The sending TCP needs to buffer the data that is in transit in case some of it gets lost and must be retransmitted, and the receiving TCP must have enough buffer space to receive all the data that's in transit even if the application doesn't get around to reading any of it.

In the old days (which mostly live on in Windows XP), the TCP buffers were limited to 64K, but more modern OSes can support pretty large TCP buffers. Some of them, like Mac OS X 10.5 and later, even try to automatically size their TCP buffers to accommodate the time it takes for packets to flow through the network. So when I send data from Madrid to Los Angeles, my buffer might be 150K at home, but at the university, my network connection is ten times faster so the buffer can grow as large as 1.5MB.

The trouble starts when the buffers in the network start to fill up. Suppose there's a 64-packet buffer on the network card—although it would be hard to fill it entirely—and another 64 packets are buffered by the router. With 1500-byte Ethernet packets, that's 192K of data being buffered. So TCP simply increases its buffer by 192K, assuming that the big quake happened and LA is now a bit further away than it used to be.

The waiting is the hardest part

Of course with all the router buffers filled up with packets from a single session, there's no longer much room to accommodate the bursts that the router buffers were designed to smooth out, so more packets get lost. To add insult to injury, all this waiting in buffers can take a noticeable amount of time, especially on relatively low bandwidth networks.

I personally got bitten by this when I was visiting a university in the UK where there was an open WiFi network for visitors. This WiFi network was hooked up to a fairly pathetic 128kbps ADSL line. This worked OK as long as I did some light Web browsing, but as soon as I started downloading a file, my browser became completely unworkable: every click took 10 seconds to register. It turned out that the ADSL router had a buffer that accommodated some 80 packets, so 10 seconds worth of packets belonging to my download would be occupying the buffers at any given time. Web packets had to join the conga line at the end and were delayed by 10 seconds. Not good.

Cringely got wind of the problem through the blog of Bell Labs' Jim Gettys, which reads like a cross between a detective novel and an exercise in higher Linuxery. Gettys suggests some experiments to do at home to observe the issue ("your home network can't walk and chew gum at the same time"), which seems to be exacerbated by the Linux network stack. He gets delays of up to 200ms when transferring data locally over 100Mbps. I tried this experiment, but my network between two Macs, using a 100Mbps wired connection through an Airport Extreme base station, was only slowed down by 6ms (Mac OS X 10.5 to 10.6) or 12ms (10.6 to 10.5).

Cringely gets many of the details wrong. To name a few: he posits that modems and routers pre-fetch and buffer data in case it's needed later. Those simple devices—including the big routers in the core of the Internet—simply aren't smart enough to do any of that. They just buffer data that flows through them for a fraction of a second to reduce the burstiness of network traffic and then immediately forget about it. Having more devices, each with their own buffers, doesn't make the problem worse: there will be one network link that's the bottleneck and fills up, and packets will be buffered there. The other links will run below capacity so the packets drain from those buffers faster than they arrive.

He mentions that TCP congestion control—not flow control, that's something else—requires dropped packets to function, but that's not entirely true. TCP's transmission speed can be limited by the send and/or receive buffers and the round-trip time, or it can slow down because packets get lost. Both excessive buffering and excessive packet loss are unpleasant, so it's good to find some middle ground.

Unfortunately, it looks like the router vendors and the network stack makers got into something of an arms race, pushing up buffer space at both ends. Or maybe, as Gettys suggests, it's just that memory is so cheap these days. The network stacks need large buffers for sessions to high-bandwidth, far-away destinations. (I really like being able to transfer files from Amsterdam to Madrid at 7Mbps!) So it's mostly up to the (home) router vendors to show restraint, and limit the amount of buffering they put in their products. Ideally, they should also use a good active queuing mechanism that avoids most of these problems either way.

Cringely may have a point when he suggests that ISPs are in no big hurry to solve this, because having a high-latency open Internet just means that their own VoIP and video services, which usually operate under a separate buffering regime, look that much better. But the IETF LEDBAT working group is looking at ways to avoid having background file transfers get in the way of interactive traffic, which includes avoiding filling up all those router buffers. This may also provide relief in the future.

Policy —

Understanding bufferbloat and the network buffer arms race

Routers, switches, and cable modems have buffers to temporarily store packets …

The waiting is the hardest part

Channel Ars Technica

The waiting is the hardest part

reader comments

Channel Ars Technica