Summary Statistics

Measuring aspects of network traffic is an extremely common task in Zeek. Zeek provides data structures which make this very easy as well in simplistic cases such as size limited trace file processing. In real-world deployments though, there are difficulties that arise from clusterization (many processes sniffing traffic) and unbounded data sets (traffic never stops). The Summary Statistics (otherwise referred to as SumStats) framework aims to define a mechanism for consuming unbounded data sets and making them measurable in practice on large clustered and non-clustered Zeek deployments.

Overview

The Sumstat processing flow is broken into three pieces. Observations, where some aspect of an event is observed and fed into the Sumstats framework. Reducers, where observations are collected and measured, typically by taking some sort of summary statistic measurement like average or variance (among others). Sumstats, where reducers have an epoch (time interval) that their measurements are performed over along with callbacks for monitoring thresholds or viewing the collected and measured data.

Terminology

Observation

A single point of data. Observations have a few components of their own. They are part of an arbitrarily named observation stream, they have a key that is something the observation is about, and the actual observation itself.

Reducer

Calculations are applied to an observation stream here to reduce the full unbounded set of observations down to a smaller representation. Results are collected within each reducer per-key so care must be taken to keep the total number of keys tracked down to a reasonable level.

Sumstat

The final definition of a Sumstat where one or more reducers is collected over an interval, also known as an epoch. Thresholding can be applied here along with a callback in the event that a threshold is crossed. Additionally, a callback can be provided to access each result (per-key) at the end of each epoch.

Examples

These examples may seem very simple to an experienced Zeek script developer and they’re intended to look that way. Keep in mind that these scripts will work on small single process Zeek instances as well as large many-worker clusters. The complications from dealing with flow based load balancing can be ignored by developers writing scripts that use Sumstats due to its built-in cluster transparency.

Printing the number of connections

Sumstats provides a simple way of approaching the problem of trying to count the number of connections over a given time interval. Here is a script with inline documentation that does this with the Sumstats framework:

sumstats-countconns.zeek
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
@load base/frameworks/sumstats

event connection_established(c: connection)
    {
    # Make an observation!
    # This observation is global so the key is empty.
    # Each established connection counts as one so the observation is always 1.
    SumStats::observe("conn established", 
                      SumStats::Key(), 
                      SumStats::Observation($num=1));
    }

event zeek_init()
    {
    # Create the reducer.
    # The reducer attaches to the "conn established" observation stream
    # and uses the summing calculation on the observations.
    local r1 = SumStats::Reducer($stream="conn established", 
                                 $apply=set(SumStats::SUM));

    # Create the final sumstat.
    # We give it an arbitrary name and make it collect data every minute.
    # The reducer is then attached and a $epoch_result callback is given 
    # to finally do something with the data collected.
    SumStats::create([$name = "counting connections",
                      $epoch = 1min,
                      $reducers = set(r1),
                      $epoch_result(ts: time, key: SumStats::Key, result: SumStats::Result) =
                        {
                        # This is the body of the callback that is called when a single 
                        # result has been collected.  We are just printing the total number
                        # of connections that were seen.  The $sum field is provided as a 
                        # double type value so we need to use %f as the format specifier.
                        print fmt("Number of connections established: %.0f", result["conn established"]$sum);
                        }]);
    }

When run on a sample PCAP file from the Zeek test suite, the following output is created:

$ zeek -r workshop_2011_browse.trace sumstats-countconns.zeek
Number of connections established: 6

Toy scan detection

Taking the previous example even further, we can implement a simple detection to demonstrate the thresholding functionality. This example is a toy to demonstrate how thresholding works in Sumstats and is not meant to be a real-world functional example, that is left to the policy/misc/scan.zeek script that is included with Zeek.

sumstats-toy-scan.zeek
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
@load base/frameworks/sumstats

# We use the connection_attempt event to limit our observations to those
# which were attempted and not successful.
event connection_attempt(c: connection)
    {
    # Make an observation!
    # This observation is about the host attempting the connection.
    # Each established connection counts as one so the observation is always 1.
    SumStats::observe("conn attempted", 
                      SumStats::Key($host=c$id$orig_h), 
                      SumStats::Observation($num=1));
    }

event zeek_init()
    {
    # Create the reducer.
    # The reducer attaches to the "conn attempted" observation stream
    # and uses the summing calculation on the observations. Keep
    # in mind that there will be one result per key (connection originator).
    local r1 = SumStats::Reducer($stream="conn attempted", 
                                 $apply=set(SumStats::SUM));

    # Create the final sumstat.
    # This is slightly different from the last example since we're providing
    # a callback to calculate a value to check against the threshold with 
    # $threshold_val.  The actual threshold itself is provided with $threshold.
    # Another callback is provided for when a key crosses the threshold.
    SumStats::create([$name = "finding scanners",
                      $epoch = 5min,
                      $reducers = set(r1),
                      # Provide a threshold.
                      $threshold = 5.0,
                      # Provide a callback to calculate a value from the result
                      # to check against the threshold field.
                      $threshold_val(key: SumStats::Key, result: SumStats::Result) =
                        {
                        return result["conn attempted"]$sum;
                        },
                      # Provide a callback for when a key crosses the threshold.
                      $threshold_crossed(key: SumStats::Key, result: SumStats::Result) =
                        {
                        print fmt("%s attempted %.0f or more connections", key$host, result["conn attempted"]$sum);
                        }]);
    }

Let’s see if there are any hosts that crossed the threshold in a PCAP file containing a host running nmap:

$ zeek -r nmap-vsn.trace sumstats-toy-scan.zeek
192.168.1.71 attempted 5 or more connections

It seems the host running nmap was detected!