A lot of people still use DSL to connect to the Internet, and most of these connections have a lower upstream bandwidth than their downstream. This can often cause latency for interactive low-bandwidth applications, like SSH and VoIP (SIP, Skype and Lync), when uploading large amounts of data because of a phenomenon called "bufferbloat". To combat this issue there are many solutions. Below I'll tell you how I've worked around this issue.
A long time ago, in Internet time, a script for Linux machines was developed that was called the LARTC Wondershaper. This served many as a good solution for years, but as Internet services blossomed and user behavior changed, so did our needs for controlling packet latency change. I wrote the first version of SuperShaper-SOHO, and improved Wondershaper, back in September 2004. It used the improved HTB scheduler and life was good for many years.
But then, a few days ago, I found, by accident, a GitHub gist about HFSC, Linux traffic shaping's best kept secret, which turned out to be a variant of Daniel S. Sterling's hfsc-shape script. I was intrigued, because I had been feeling lately that something didn't quite work right with the older version of my setup which I've used for years. Latency wasn't controlled as well as it was before. I started reading up on HFSC (hierarchical fair-service curve). I found the man pages tc-hfsc(8) and tc-hfsc(7), but those just confused me even more. Reading up on this just got me more and more confused, especially when they talk about the real-time ("rt") service curves. HFSC is really hard to understand! Eventually I found the kernel documentation called sch_hfsc.txt which was the best explanation I've found so far. It showed some fairly simple math on how to calculate the values I needed.
Refactoring the existing script
I started looking at my old version of the script and saw that this was no easy feat. I started looking into my existing script and figured out that I'd gotten something mixed up back in the day when I wrote the initial version. When you define filters for a classful scheduler you need to make sure that your filters are defined in a way that narrower terms are matched before the broad ones. The pre-1.6 versions of my script have this woefully wrong, which is why they don't work properly. Once I had all of this fixed (and my flow and filter definitions refactored into proper Bash functions) I was good to go.
Going from HTB to HFSC
Once I had refactored everything into nice Bash functions it wasn't actually so hard to change from HTB to HFSC. All my packet filters could stay the same, because they just ensured traffic would be put in the proper flow. I had to change my old HTB priority-based
define_flow function into a HFSC-based
define_ls_flow function. As you can imagine by the name, I've completely ignored the real-time ("rt") parts of the HFSC scheduler and just implemented simple link-sharing ("ls").
The core difference between HTB and HFSC, when used only with link-sharing service curves, is that the algorithm doesn't dequeue packets in terms of flow priority, but instead specify minimum bandwidth a specific flow should get if the link is fully saturated.
So with that understanding in place I defined my flows in terms of how much bandwidth the different types of traffic should get if my link was fully saturated. With this setup any flow can borrow packets from another flow as long as the link isn't saturated. But the moment it is, even tiny flows (like TCP ACK packets, ICMP/DNS traffic and VoIP) get priority over bandwidth-heavy uploads.
The end result is that as long as nothing else is happening, my cloud backup can utilize my full upstream bandwidth, but the moment I start watching YouTube/Netflix or take a VoIP call, that traffic will get priority and the cloud backup traffic will yield down to what I've set as a minimum bandwidth allowance for that flow. My SSH console sessions now work as if there were no traffic on the link even though I'm fully utilizing my upstream.
I would really like to figure out how to easily define a real-time service curve, but the complexity of it baffles me. If you have some insight into this topic a pull request is most welcome, as I'd like for the script to have this feature.
As the sch_hfsc kernel documentation mentioned, and the hfsc-shape scripts I've linked to, they implement link-sharing using the upperlimit ("ul") service curve. I've avoided this, so that any flow can saturate the uplink when no other traffic is present. That's a win-win in my book.
I've also created a Telegraf script that monitors the bytes sent and the period value for each flow, which gives a good understanding of how this setup behaves. This script allowed me to quickly understand why the older version of this setup was broken. I might blog about this monitoring solution in a later article.