Why Test NetFlow with Tcpreplay 4.0
by Fred Klassen Fred Klassen on

In my blog “NetFlow Performance Testing with Tcpreplay 4.0” I introduced new IP Flow / NetFlow features that I recently added to Tcpreplay, with the claim that this is the best solution for testing Layer 7 flow collectors. Today we discover how Layer 7 collectors differ from traditional flow collectors, and why they need to be tested differently.

So, what is are Layer 7 flow collectors anyway? They are NetFlow collectors that go beyond simple 5-tuple Layer 3 collection, providing complete Layer 7 application visibility. Rather than simply inspecting the TCP/IP packet headers of each packet, they employ Deep Packet Inspection (DPI), analyzing the complete content of every packet on the network. This greatly enhances NetFlow, providing definitive evidence of security breaches and inappropriate human behaviors that are simply not available with traditional Layer 3 flow collectors.

Here is an example of a high-level Layer 7 application classification report using FlowView. Although no security threats are being reported, excessive recreational YouTube traffic is being alerted.

netflow performance testing

Classifying content that the user is accessing rather than inferring content from IP addresses is a giant leap forward in NetFlow technology. But Layer 7 flow collection comes at a performance cost. Take for example FlowView, which uses DPI to classify 1200+ applications. Some applications such as Facebook may be classified within the first few packets in a flow. However it may take several more packets within the flow to further classify it as FarmVille on Facebook. This type of flow will consume considerably more than average CPU and memory during flow classification.

Example of DPI Resource Overhead

In this example we test FlowView with and without DPI mode enabled. The following chart illustrates the performance difference between 5-tuple and full DPI analysis. Notice the relatively low average per-core CPU utilization when FlowView is simply classifying Layer 3 5-tuple traffic. Next FlowView is switched back to normal Layer 7 DPI classification, and unsurprisingly CPU usage increases.

netflow performance testing

In both tests we use Tcpreplay 4.0 to replay a large network packet capture taken from a busy Internet access point. This packet capture contains a variety of Layer 7 network applications as would be expected when monitoring hundreds of users. In Layer 3 mode, only packet headers need to be inspected and content of the packets are irrelevant. However when Layer 7 DPI classification is enabled, the type of content becomes significant. Real network traffic drives CPU usage higher than synthetic traffic. To understand how a Layer 7 flow collectors will behave in a real network you must test with “real” traffic patterns.

Now let’s look at the effects of DPI collection on flows per second (fps). The sustained fps reduces when DPI is employed. Please note that with the industry’s obsession with this statistic, many vendors may hesitate to illustrate this fact. I feel that FlowView is very well optimized and that even after the DPI penalty, results are very good.

netflow performance testing

Observations and Caveats

Here are some comments and observations regarding these test results:

  • Each point on these graphs represents 6 minutes of sustained test traffic, generated by Tcpreplay version 4.0 using a “real” network packet capture.
  • Accepted industry standard for sustained flow rates is 5 flows per Mbps capacity. Therefore on this 10GigE link we would expect all commercial flow collectors to achieve at least 50,000 fps (50 Kfps).
  • Some Layer 3 5-tuple flow providers claim 100 Kfps, but in this example you can see that this is easily achieved when not doing DPI classification. With DPI classification disabled, FlowView achieves ~130 Kfps.
  • With DPI collection enabled we could claim that this FlowView collector can achieve ~100 Kfps, but that claim would be a disservice. The more honest statement is that FlowView can sustain 80+ Kfps without any loss in accuracy, when being tested with “real” network traffic.
  • Tcpreplay is able to send packets to the flow collector at a maximum rate of 136,610 fps before saturating the 10GigE link. You can attain higher TX flow rates with when using less sophisticated packet captures, or by sending synthetic traffic. However these patterns do not exercise Layer 7 flow collectors to the same extent and would not reflect real-world performance.


Moving from Layer 3 to Layer 7 NetFlow facilitates numerous enhancements including threat detection, social media monitoring and policy compliance monitoring. But this is at a cost of additional processing, and therefore the Layer 7 NetFlow needs to be optimized more than flow collectors that are simply processing packet headers.

Further, Layer 7 NetFlow resource consumption increases when processing increasingly complex combinations of network applications. It may not be sufficient to test with synthetic traffic patterns, such as those generated by hardware packet generators. To understand how a Layer 7 NetFlow appliance will behave on a real network, it is crucial that real network patterns be used for testing. I recommend testing with the redesigned Tcpreplay version 4.0 with new IP Flow / NetFlow features.

Testing Layer 7 DPI NetFlow collectors does not have to be expensive and tedious. Tcpreplay 4.0 which is free, easy to use and replays real network traffic at hardware packet generator rates.

My next blog will take you step-by-step through the process of setting up a test environment on commodity hardware and reproducing the above results.

Filed Under: Networking Technology, Performance Monitoring

Tags: flow analysis , FlowView , monitoring technology , network speed , NPM