A while ago, I wrote about various TCP congestion control algorithms and how each of them improves over the existing algorithm in modelling the network state. There, I oversold the promise of BBR – the current widely deployed congestion control protocol, developed by Google (2016) - as being how fast and congestion-free it makes internet traffic. Indeed, BBR is a practical approach to addressing the issue of shallow and deep buffers. But in no way is it a perfect representation of goodput–especially when competing with other loss-based congestion control protocols (like CUBIC). BBR has its own problems – one of which I mentioned here, but left it without elaborating. It is concerned with cellular networks where random burst losses are common. BBR, being indifferent to the type of packet loss, treats random losses as a result of congestion. It probes the bandwidth by adjusting the sending rate to 1.25x or 0.75x the current estimate. This loss reaction is a consequence of RTT measurement over the last 8 RTTs and has nothing to do with the direct measurement of loss rate. The original BBR(v1) suffered from unfairness and burst losses.
BBRv2, introduced in 2019, incorporates the packet loss rate and smooths the probing mechanism by adding more phases to control abrupt adjustments to the congestion window. The underlying algorithm (Klienrock’s power) stays the same. Everything else is optimisation.
By keeping track of received ACKs/NACKs from the receiver and timeouts, the sender maintains an estimate of inflight packets and
Loss_rate ≈ packets_sent_in_window / packets_lost_in_window.
Using this loss_rate, BBRv2 maintains a lower bound that is adjusted quickly in response to burst losses. It also maintains an upper bound that is adjusted gradually in response to loss due to pacing/probing for higher bandwidth. The number of packets in flight always lies between these two bounds (typically 15% less than the upper bound). This careful responsiveness makes BBRv2 fairer than BBRv1, allowing other flows to catch up and utilise the available bandwidth. Probing takes place in four phases: PROBE_DOWN, PROBE_CRUISE, PROBE_REFILL, and PROBE_UP. Together they form one complete cycle. We can think of them roughly as a smoother version of 1.25/0.75 probing. That’s all. That is BBRv2.
BBRv2 improves fairness by not optimising at the full potential of BDP (at least 15% less than that) and reacts to random burst losses by lowering the lower bound further.