In part one of this blog I went over some of the configuration considerations for 10 GigE server NICs. “Performance” in that case was defined in terms of raw throughput – specifically TCP – which might be important to backup-and-recovery or a data center server. It is pleasantly surprising to discover that the right choice of settings can reliably get close to 99% utilization of the link.
Your mileage may vary (YYMV), of course. And some NICs, servers and OSs (I’m not naming any… right now) are not quite up to the job…
However, for many of you raw throughput fails to impress. That is, your links may be far from saturated; instead, you are most intent on gaining the latency advantages that 10G offers.
Why is 10G better for latency? In a word, serialization – the time it takes to process a packet and send it out on the wire (or read it back) is reduced by roughly a factor of 10 over 1G. Propagation time (rate of the signal travel, whether through optical fibre or copper) stays pretty much the same. So you don’t strictly gain anything over a given distance. But the time for a packet of a certain size to be transmitted to the wire or received at each interface is drastically reduced.
Theoretically, the serialization time for a 1500 byte packet is around 12 microseconds at 1G and 1.2 microseconds at 10G. This is about how long it takes a given interface to read from or write the packet to the wire. However there are plenty of other sources of latency in the end-to-end path (particularly at the end-hosts which can and should be the focus of tuning).
To give you a feel for the realities – in my experience, a 1500 byte packet that is “ping-ponged” between two decently tuned servers on a 1G LAN network will make its round trip in around 120-180 micro-seconds. This is from application layer to application layer. Roughly one-third of that time is due to serialization, when there is a switch involved, with 4 different interfaces requiring reads/writes 8 times in a ping-pong (with pairs of read/writes happening near simultaneously). Moving to 10G will free up most of that and round trip time will typically be shortened to around 70-80 micro-seconds.
Again, that is measured at the application layer – RTTs further down in a well tuned stack can be 30-40 micro-seconds. Once again YMMV – and I’d be interested to hear what the latency-obsessed consider to be acceptable performance in a round-trip time.
And who are these latency-obsessed types? In the fast and furious world of financial transactions for example, latency is king. Well, rather, latency is a princess – and all work to serve her delicate sensibilities. Enormous amounts of money are spent shaving a few milliseconds off of response times – because even more enormous amounts of money are at stake.
In that world, some of the configurations that help optimize for raw throughput are poison for fast packets. In particular, interrupt coalescence – typical 10G NICs default to 75-100 micro-seconds of wait time between receiving a packet and triggering an interrupt. When many packets arrive at once, this limits the number of interrupts sent and reduces the load on the CPU. But waiting that long is an anathema for low-latency. Turning off coalescence, or at least reducing the wait times to their minimums, is a must.
Other more refined tuning may include binding interrupts and network-bound applications to the same CPU to make the packet processing even more efficient. And very careful choices with regard to buffer sizes and allocation strategies can help as well. This gets well into the black arts of low-latency tuning. Beyond that, you may start to ask yourself if Ethernet is really your best choice for low-latency inter-connects.
The so-called “wizard gap,” between what can be attained out-of-the-box and what a network wizard can achieve, keeps getting bigger. Expect to tune your 10G-based systems if you really want the gains 10 Gigabit Ethernet can provide. But that is why they pay you the big bucks, right?