Parallelizing FlowView: Performance Improvements
by August 5, 2014

Filed under: Networking Technology, Performance Monitoring

FlowView provides detailed insight into the applications on one’s network, but to achieve this, we need to store a lot of data. Behind the scenes, when we are trying to get top applications or top hosts and categories, FlowView has two available data sources: Google BigQuery and raw capture files. BigQuery is fast for many use cases, especially when the data set is large and does not have to be real-time. However, up-to-the-minute data analysis over moderately large datasets, we parse the raw files directly to extract the information we are looking for. Sometimes, when looking up data distribution for long intervals or prying into appliances which are generating significantly large amounts of traffic, the response time in our FlowView UI is a point of concern.

To tackle this issue, we introduced parallel processing of flow files as part of our release last month. Essentially, any information we see on our FlowView screen translates to an nfdump command that reads through the data uploaded from individual appliances. Further, all requests from our FlowView UI are time bound – they show data within a start and end time interval. This, in turn determines the number of flow files that need to be processed by the FlowView backend. In our previous implementation, each FlowView request would be served by a single thread which would gain access to the resources required to serve the request. The time taken to serve each request would is generally governed by two major factors – the time range requested and the flows per second (fps) of the appliance.

Shown below are the sample appliances we used in this experiment.





No. of nfcapd files

Time Period

Size on disk

R8-FV-PERF-LOW-4-FPS-02 m30 eth0 Low 16885 60 days 327M
R8-FV-PERF-MODERATE-1500-FPS-02 R40 eth0 Moderate 576 2 days 5.0G
R8-FV-PERF-HIGH-4000-FPS-03 R40 eth0 High 288 1 day 5.6G

We also chose a few frequently requested operations that we wanted to optimize.

  • Traffic Summary
  • Top Applications
  • Top Hosts
  • Host Details

Shown below are the initial response times from these operations

Traffic summary

Appliance/GUID Interface 1hr (sec) 4hr (sec) 8hr (sec) 1d (sec) 7d (sec) 30d (sec)
R8-FV-PERF-LOW-4-FPS-02 eth0 2.1 2.1 2.2 2.6 4.6 6.5
R8-FV-PERF-MODERATE-1500-FPS-02 eth0 4.2 9.2 14.4 47.3 55.2
R8-FV-PERF-HIGH-4000-FPS-03 eth0 7.0 38.2 40.6 50.5

Note that higher rate appliances do not all 30 days of data as the file size exceeds the limit of nfdump and need to be fetched from Big query where this optimization does not apply.

Top Applications

Appliance/GUID Interface 1hr (sec) 4hr (sec) 8hr (sec) 1d (sec) 7d (sec) 30d (sec)
R8-FV-PERF-LOW-4-FPS-02 eth0 2.5 2.0 2.2 2.4 5.0 7.4
R8-FV-PERF-MODERATE-1500-FPS-02 eth0 4.1 9.3 14.2 47.4 55.4
R8-FV-PERF-HIGH-4000-FPS-03 eth0 7.1 34.8 40.8 50.5

Top Hosts

Appliance/GUID Interface 1hr (sec) 4hr (sec) 8hr (sec) 1d (sec) 7d (sec) 30d (sec)
R8-FV-PERF-LOW-4-FPS-02 eth0 1.5 1.7 1.7 2.0 3.5 6.0
R8-FV-PERF-MODERATE-1500-FPS-02 eth0 4.1 15.5 15.5 52.3 56.2
R8-FV-PERF-HIGH-4000-FPS-03 eth0 7.4 42.9 42.9 54.8

Host Details

Appliance/GUID Interface 1hr (sec) 4hr (sec) 8hr (sec) 1d (sec) 7d (sec) 30d (sec)
R8-FV-PERF-LOW-4-FPS-02 eth0 3.7 2.5 2.7 2.7 5.1 6.7
R8-FV-PERF-MODERATE-1500-FPS-02 eth0 4.7 8.8 12.9 43.7 50.8
R8-FV-PERF-HIGH-4000-FPS-03 eth0 6.9 32.6 36.8 45.3

As evident –

  • when querying an appliance emitting more flow records per second, the response time is generally slower.
  • when querying over larger time intervals, response time is slower.

The response time corresponds to the number of files that need to be processed and of course the size of individual files.

To counter this issue we introduced a logic that splits individual FlowView requests into multiple execution threads and run them in parallel. The total time period requested is divided into equal chunks based and the results from each chunk amalgamated. Now the first question to be asked was when to split? and how many threads to split into? Every request that is made had different performance gain/loss when split into threads and based on this knowledge – an algorithm needed to be introduced.

Our approach to solve this was purely heuristic – get a framework in place that would support splitting and merging threads, start from a low number and scale to larger number of threads, measure performance gains across appliance types over different intervals.

The results from this experiment is shown below :

parallelize flowview

Note that time required to query moderate rate appliances is sometimes close to or even longer than the corresponding high rate appliance. The charts show data from each type of appliance but when calculating gain – only data from a particular appliance is compared.

parallelize flowview

parallelize flowview

parallelize flowview

As evident from the results, for each of the four different queries, appliances with large flow rates initially had slower query times and show the best improvement results for queries ranging from short to long intervals. While appliances with slower flow rates only show improvement when the query interval is significantly longer. This led to a simple logic to be in place. The appliances with higher flow rates are generally rack appliances (r40 and r400) – would benefit most from splitting into 4 threads when requests encapsulate data for anything more than an hour. The lower flow-rate micro appliances (m20, m22 and m30) would only benefit from thread splitting when time range for data requested is over 24 hours and two threads are sufficient. For larger number of threads, the overhead of splitting and joining overweighs the performance gains achieved from parallel execution. As we increase the interval to about a month, even appliances with low flow rates start to benefit from having 4 threads or even more.

To keep the design simple and conservative, we started with 2 threads for micro appliances for requests over 24 hours and 4 threads for rack appliances for request over 1 hour. Introducing this improvement has shown great improvements in the flowView UI – some queries which were almost unusable before now show significantly improved response times. This method demonstrates how simple improvements can go a long way in improving performance. The tactics are so generic that it can be used in any system suffering from performance issues when querying large number of data chunks.

FlowView users should have a better experience since this release has been pushed to our servers and stay tuned for more improvements to come in the future.