How to Easily Capture TCP Conversation Streams
by June 4, 2014

Filed under: Networking Technology, Performance Monitoring

When verifying that FlowView is identifying flows with the correct application, we use tcpreplay to send traffic from packet capture files into an appliance capture interface. The packet captures can be homemade by us or from our customers. These packet captures are typically recorded without filtering, so naturally they will contain extraneous protocols and many mid-conversation sessions that happened to be in progress at capture start time.

When testing new applications that can be identified by FlowView, we give it an initial, best chance at succeeding to do so. To do this we want to feed it traffic from only the application we’re interested in. For TCP-based applications, we filter out just the full TCP conversations.

We can start to identify the full TCP conversations by looking for the 3-way handshakes. Finding the SYN and SYN-ACK packets of each TCP conversation being initiated is pretty simple to do in Wireshark by applying a post-capture filter like tcp.flags.syn == 1 && tcp.flags.ack == 0. That filter will find the SYN packets – to also find SYN-ACK packets, a second filter is needed: tcp.flags.syn == 1 && tcp.flags.ack == 1. Combine the two filters with a logical OR:

(tcp.flags.syn == 1 && tcp.flags.ack == 0) || (tcp.flags.syn == 1 && tcp.flags.ack == 1)

image00

This is a good start – this filter has identified the TCP conversations that began after the capture was started, but how can we get all packets of each conversation for every conversation? We can right-click on each SYN or SYN-ACK packet and choose “Follow TCP Stream” – that will give you all the packets for that TCP conversation, but that’s going to be annoying after the third or fourth TCP conversation.

It is time to turn to Wireshark’s lovable, command-line cousin, tshark, to provide us with a script-able solution to the problem. We can supply tshark with the same filters that Wireshark accepts and control what we get for the output. What we need is a filter to display every full TCP conversation inside our packet capture file. Wireshark keeps track of all TCP conversations with a stream ID. tshark can help us build up that list of stream IDs. Let’s get familiar with using tshark for this purpose.

tshark reads in packet capture files with the – r option and applies filters with the -R option:

tshark -r <capture file> -R "<filter>" -T fields -e tcp.stream

Example:

$ tshark -r my-capture.pcap -R "(tcp.flags.syn == 1 && tcp.flags.ack == 0) || (tcp.flags.syn == 1 && tcp.flags.ack == 1)"
 81   4.934099 192.168.3.110 -> 64.4.45.62   TCP 50329 > msnp [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=2
 82   4.964977   64.4.45.62 -> 192.168.3.110 TCP msnp > 50329 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=0
 95   5.040071 192.168.3.110 -> 207.46.124.85 TCP 50330 > msnp [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=2
 99   5.070912 207.46.124.85 -> 192.168.3.110 TCP msnp > 50330 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=0

The default output is the packets themselves. To obtain the stream IDs, the “-T fields” option lets us specify that we want to see specific fields in the output and the field we want to see the tcp.stream values which is specified with the -e option:

tshark -r <capture file> -R "<filter>" -T fields -e tcp.stream

Example:

$ tshark -r my-capture.pcap -R "(tcp.flags.syn == 1 && tcp.flags.ack == 0) || (tcp.flags.syn == 1 && tcp.flags.ack == 1)" -T fields -e tcp.stream
7
7
9
9
10
11
10
...

We are very close to what we want – a way to identify which full TCP conversations are recorded in the packet capture file. We have a list of stream IDs that tshark and Wireshark understand. All we need to do is process that list into a new filter that tshark can accept for a second execution. The new filter will have a format of:

tcp.stream == <stream ID 1> || tcp.stream == <stream ID 2> || ... tcp.stream == <stream ID n>

A simple bash script can help us to build up that new filter, apply it with a second run of tshark and write out the filtered capture file as a result:

#!/bin/bash

# Take the input capture file as a command-line argument to the script
IN_PCAP_FILE=$1
OUT_PCAP_FILE=FullConv-${IN_PCAP_FILE}

# Obtain the list of TCP stream IDs
TCP_STREAMS=$(tshark -r $IN_PCAP_FILE -R "(tcp.flags.syn == 1 && tcp.flags.ack == 0) || (tcp.flags.syn == 1 && tcp.flags.ack == 1)" -T fields -e tcp.stream | sort -n | uniq)

# Generate a new tshark filter for each stream ID
TSHARK_FILTER=""
for stream in $TCP_STREAMS; do
  if [ "$TSHARK_FILTER" = "" ]; then
	TSHARK_FILTER="tcp.stream==${stream}"
  else
	TSHARK_FILTER="${TSHARK_FILTER}||tcp.stream==${stream}"
  fi
done

# Apply the stream ID filter and write out the filtered capture file
tshark -r $IN_PCAP_FILE -R "${TSHARK_FILTER}" -w $OUT_PCAP_FILE

Let’s give it a spin:

administrator@host03:~$ ls -l
total 30569
-rwxr--r-- 1 administrator administrator      741 2014-05-30 19:10 filter-full-conv.sh
-rw-r--r-- 1 administrator administrator 31301111 2014-05-30 19:10 my-capture.pcap
administrator@host03:~$
administrator@host03:~$ ./filter-full-conv.sh my-capture.pcap
administrator@host03:~$ ls -l
total 61023
-rwxr--r-- 1 administrator administrator      741 2014-05-30 19:10 filter-full-conv.sh
-rw-r--r-- 1 administrator administrator 31184789 2014-05-30 19:16 FullConv-my-capture.pcap
-rw-r--r-- 1 administrator administrator 31301111 2014-05-30 19:10 my-capture.pcap
administrator@host03:~$
administrator@host03:~$ tshark -r FullConv-my-capture.pcap
  1   0.000000 192.168.3.110 -> 64.4.45.62   TCP 50329 > msnp [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=2
  2   0.030878   64.4.45.62 -> 192.168.3.110 TCP msnp > 50329 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=0
  3   0.031609 192.168.3.110 -> 64.4.45.62   TCP 50329 > msnp [ACK] Seq=1 Ack=1 Win=65700 Len=0
  4   0.031903 192.168.3.110 -> 64.4.45.62   MSNMS VER 1 MSNP21 MSNP20 MSNP19 MSNP18 MSNP17 CVR0
  5   0.070123   64.4.45.62 -> 192.168.3.110 MSNMS VER 1 MSNP21
...

Now we have a really simple way to sift out just the full TCP conversations from an unfiltered packet capture file.