A More Complex Script

For this tutorial, we will build a script which searches for certain patterns in HTTP entities. These will be in a list of “interesting patterns” that the user can provide. Then, we will augment the HTTP log with the number of matches. This particular script will be very slow, so not a production-level analysis, but it will help show many of the core principles of Zeek scripts and augmenting logs.

Recall that Zeek’s scripting language is event-based. As Zeek processes network traffic, it triggers events. When making a script, the author has to decide which events to react to. For this case, we care about HTTP “entities”: the body of HTTP requests and responses.

We can find the corresponding event by looking through the HTTP protocol documentation. In this case, we care about HTTP entities, so the http_entity_data event looks promising. This event provides a string containing the entity data. Its signature is:

event http_entity_data(c: connection, is_orig: bool, length: count, data: string)

With this information we can see what entities might look like. Users can use the print statement in order to print a given object. In this case, let’s print the data directly:

test.zeek
event http_entity_data(c: connection, is_orig: bool, length: count,
    data: string)
    {
    print data;
    }

Note

This (and many other programming tutorials) use printing in order to demonstrate functionality. However, it’s important to note that in Zeek print is almost entirely a tool for debugging. Production-grade scripts should use other tools such as logging, the notice framework, or Zeek’s reporter facility in order to convey information.

Save the above in a file test.zeek and invoke Zeek on the quickstart pcap:

# zeek -r traces/zeek-doc/quickstart.pcap test.zeek

This should print the result from trying to access zeek.org via HTTP.

Note

In order to keep the tutorial consistent, the examples use a capture file. But, in this case, you can test it with live traffic. To do so, first start running Zeek on a network interface:

# zeek -C -i eth0 test.zeek
listening on eth0

Then, open another terminal in the container from the host:

$ docker exec -it zeek-tutorial /bin/bash

This prompt will be used to generate traffic for Zeek. Now, curl in the new terminal session:

# curl example.com
<!doctype html>...

Both windows should print HTML content. You can exit the previous Zeek invocation with Ctrl+C.

Now, try the same thing, except change the Zeek invocation to include a redefinition for http_entity_data_delivery_size:

# zeek -r traces/zeek-doc/quickstart.pcap test.zeek http_entity_data_delivery_size=10

Zeek’s output will look different—namely, every 10 bytes, there should be a newline. The http_entity_data event gets called in batches for large entities, so Zeek doesn’t have to buffer up the entity in its entirety. Therefore, we must reassemble the complete data before matching patterns on the entity, just in case the pattern spans over multiple events. That will be the first step.

Reassembling HTTP Entities

Thankfully, Zeek provides a convenient way to store state between event calls within the same connection: The connection record!

Many protocols append a record to the connection record in order to store connection state, either for logging or simply tracking something. For HTTP, this record is the HTTP::State record. The name State is convention for protocols which must maintain state for multiple requests or responses. Not only does this store information that the analyzer uses, we can also append our own fields to it for various purposes. We will use the redef keyword for this.

Above the http_entity_data event, let’s add a string to keep track of the entity data we’ve seen so far:

test.zeek
+redef record HTTP::State += {
+   entity: string &default="";
+};
+
 event http_entity_data(c: connection, is_orig: bool, length: count,
     data: string)
    {

This statement will take the HTTP::State record mentioned before and add a field to it. When fields get added, they must have either &default (which specifies the default value) or &optional (which means you don’t need to initialize the field if you don’t want to).

Note

To see why these are needed, consider pre-existing code that creates an HTTP::State instance: it wasn’t written with awareness of the new field, so Zeek wouldn’t know what value to assign it. Either of the attributes provides a way out.

In this case, we have a simple default that we can use to “build up” the entity, so we use &default. The default entity value gets created whenever the HTTP::State record is created by the HTTP analyzer. The HTTP analyzer doesn’t need to know that we just appended a field to its record.

Then, we can modify the event handler to add the data to this for each event:

test.zeek
 event http_entity_data(c: connection, is_orig: bool, length: count,
     data: string)
    {
-   print data;
+   c$http_state$entity += data;
+   print c$http_state$entity;
    }

Inside the event, we have two new statements. The first is where most of the magic happens. For Zeek scripting, the $ separates field values. This is often . in other languages (like my_class.my_field). We then use the += operator to concatenate the data string to what’s in that field.

The other key here is that connection object. The connection record (that is, the first argument to the event) carries around state for the connection. Many protocols will use redef to add extra fields associated with that protocol—in this case, the HTTP analyzer adds both an HTTP::Info and HTTP::State field. You can see which fields an analyzer adds to the connection object in the “redefinitions” section in the script’s documentation—such as here for HTTP. You can see from that section that the HTTP analyzer adds a variable http_state with type HTTP::State to the connection record—thus, we can use it!

Before we do so, we need to ensure that the c$http_state field exists before we use it, since its presence is optional. Using an optional field that’s absent would be a runtime error:

expression error in ./test.zeek, line 7: field value missing (c$http_state)

Therefore, we should wrap anything that uses http_state with a field value existence check with ?$:

test.zeek
 event http_entity_data(c: connection, is_orig: bool, length: count,
     data: string)
    {
-   c$http_state$entity += data;
-   print c$http_state$entity;
+   if ( c?$http_state )
+       {
+       c$http_state$entity += data;
+       print c$http_state$entity;
+       }
    }

For every update, this will print the accumulated entity up to that point. If the entity data is split over multiple event invocations, this will print an increasingly larger entity chunks.

For testing, try deleting the connection record`s http_state before the if statement. Nothing should print, since you check the existence of that optional field before printing.

It’d be better to print the entity only once, when complete. For this, we can use the http_end_entity event. Remove the print in http_entity_data, and move it to the http_end_entity event:

test.zeek
    if ( c?$http_state )
        {
        c$http_state$entity += data;
+       }
+   }
+
+event http_end_entity(c: connection, is_orig: bool)
+   {
+   if ( c?$http_state && |c$http_state$entity| > 0 )
+       {
        print c$http_state$entity;
+       delete c$http_state$entity;
        }
    }

Now, it will only print once—at the end of an entity. We also delete the entity here, since it’s assumed entities can’t be nested, so we’re done with it. If you care for nested entities, this would not be sufficient.

There is one more caveat. This gives theoretically unbounded state growth, as entity has no upper bound. We should introduce an upper bound that users can configure. This is easy with redefineable options!

First, we declare the option at the top of the file in an export block:

test.zeek
+export {
+   option max_reassembled_entity_size = 10000 &redef;
+}
+
 redef record HTTP::State += {
    entity: string &default="";
 };

Note

Zeek has two main types for numbers: int (if it can be negative) and count (if it cannot be negative). The max_reassembled_entity_size is an int—but it should not be negative! This makes Zeek understand that the result of any calculations using this number may also be negative. Thus, later, when we subtract another count, this number may be negative. If it were a count, there is potential for that result to “underflow” and become a very large number instead—which would be a bug.

For more information, see the count documentation.

Also note, options can be changed, but only through specific mechanisms. See the option declaration documentation for more information.

Then, we want to reach exactly that entity size, but never exceed it. You can use |...| around a string to get its size, like |c$http_state$entity| will get the length of the string in that field. You can do the same to get the size of most containers, like a vector. If we subtract it from max_reassembled_entity_size, that should be the remaining length:

test.zeek
    {
    if ( c?$http_state )
        {
+       local remaining_available = max_reassembled_entity_size - |c$http_state$entity|;
+       if ( remaining_available <= 0 )
+           return;
+
        c$http_state$entity += data;
        }
    }

The local keyword just means that remaining_available will not be usable outside of the current scope—which will be the if block.

Next, we will just decide how much of data to add depending on length:

test.zeek
        if ( remaining_available <= 0 )
            return;
 
-       c$http_state$entity += data;
+       if ( length <= remaining_available )
+           c$http_state$entity += data;
+       else
+           c$http_state$entity += data[:remaining_available];
        }
    }
 

Where the subscript operator (in data[:remaining_available]) allows extracting only the remaining available data if we can only hold part of it.

The full script at this point is here for your convenience:

scripts/tutorial/01-http-entities.zeek
 1export {
 2    option max_reassembled_entity_size = 10000 &redef;
 3}
 4
 5redef record HTTP::State += {
 6    entity: string &default="";
 7};
 8
 9event http_entity_data(c: connection, is_orig: bool, length: count,
10    data: string)
11    {
12    if ( c?$http_state )
13        {
14        local remaining_available = max_reassembled_entity_size - |c$http_state$entity|;
15        if ( remaining_available <= 0 )
16            return;
17
18        if ( length <= remaining_available )
19            c$http_state$entity += data;
20        else
21            c$http_state$entity += data[:remaining_available];
22        }
23    }
24
25event http_end_entity(c: connection, is_orig: bool)
26    {
27    if ( c?$http_state && |c$http_state$entity| > 0 )
28        {
29        print c$http_state$entity;
30        delete c$http_state$entity;
31        }
32    }

Searching for Patterns

Now, we have all of the data in a given entity stored in c$http_state$entity. We may want to examine that reassembled data for certain patterns. Then, just for completeness, we can log how many of those patterns matched entities in the HTTP connection.

Patterns in Zeek are built on regular expressions—they can be used to find matches within a larger string. They are enclosed by forward slashes (/). You can read more about them from the pattern documentation.

We want to find specific strings within the HTTP entity, so this is perfect. First, let’s see how you would search for a pattern in HTTP traffic. In http_end_entity we print the entity, let’s change that to print if some pattern matched:

test.zeek
    {
    if ( c?$http_state && |c$http_state$entity| > 0 )
        {
-       print c$http_state$entity;
+       local pat = /Will not match!/;
+       print fmt("Did the pattern '%s' match? %s", pat, pat in c$http_state$entity);
        delete c$http_state$entity;
        }
    }

This uses fmt in order to print readable results. See that BIF’s documentation for more information, but it allows similar format strings to printf in C.

Running this on the quickstart pcap will yield no matches:

# zeek -r traces/zeek-doc/quickstart.pcap test.zeek
Did the pattern '/^?(Will not match!)$?/' match? F
Did the pattern '/^?(Will not match!)$?/' match? F

Note that in Zeek, true and false are represented by single-character T and F respectively.

We can change this script to actually match, say with a <body> tag:

# zeek -r traces/zeek-doc/quickstart.pcap test.zeek
Did the pattern '/^?(<body>)$?/' match? T
Did the pattern '/^?(<body>)$?/' match? T

At this point, we need:

  1. A list of user-provided patterns to match

  2. How many of those patterns matched the entity content

The first is easy, it’s similar to the max_reassembled_entity_size from before. Just put a vector in the export block with &redef:

test.zeek
 export {
    option max_reassembled_entity_size = 10000 &redef;
+   const http_entity_patterns: vector of pattern = {
+       /Will not match!/,
+       /<body>/,
+       /301 Moved Permanently/,
+   } &redef;
 }
 
 redef record HTTP::State += {

Then part 2 can be done in a function that takes the content and returns the number of patterns that matched. Functions are defined similar to events, just with the function keyword. These have to be explicitly called in your Zeek scripts. Here is the function signature:

function num_entity_pattern_matches(state: HTTP::State): count {

This function takes in a single HTTP::State as a parameter and returns a count—simple enough. One important point is that this function’s parameter is not the entity itself, but the HTTP state. This is because atomic values (like counts, addresses, and strings) are passed by value in Zeek. That means if the entity was passed in directly, it would get copied, which could be very expensive. Instead, we pass in the HTTP state. Types like records or tables are passed by reference, so no copy is necessary.

Now, its implementation simply loops through the patterns in http_entity_patterns and counts the matches:

test.zeek
    entity: string &default="";
 };
 
+function num_entity_pattern_matches(state: HTTP::State): count
+   {
+   local num_matches = 0;
+   for ( _, pat in http_entity_patterns )
+       {
+       if ( pat in state$entity )
+           num_matches += 1;
+       }
+
+   return num_matches;
+   }
+
 event http_entity_data(c: connection, is_orig: bool, length: count,
     data: string)
    {

There is one common trip-up in this function: for loops. In Zeek scripts, using a for loop often loops over the indexes rather than elements. That’s what the _ in the for loop is: that’s an unused index, which would often just count up from 0 each iteration. You can add a second optional parameter, named pat in the function, which contains the actual elements.

Note

Zeek’s native types are quite powerful on their own. For example, this case could be done in a similar fashion with a table of patterns:

function num_entity_pattern_matches(state: HTTP::State): count
  {
  local entity_patterns: table[pattern] of count = {
          [/.*Will not match!.*/s] = 1,
          [/.*<body>.*/s] = 2,
          [/.*301 Moved Permanently.*/s] = 3,
  };

  return |entity_patterns[state$entity]|;
  }

This is a more efficient way to match a large number of known patterns. However, there are a few extra considerations that are outside of the scope here. For example, since we have newlines in the HTTP entities, a s character is necessary at the end of each pattern (see the pattern documentation for more information).

See the table section for more interesting ways to use tables, including another “special lookup” for subnets and addresses.

Finally, call this new function when we finish collecting entity data:

test.zeek
    {
    if ( c?$http_state && |c$http_state$entity| > 0 )
        {
-       local pat = /Will not match!/;
-       print fmt("Did the pattern '%s' match? %s", pat, pat in c$http_state$entity);
+       print fmt("Found %d matches in the HTTP entity", num_entity_pattern_matches(
+           c$http_state));
        delete c$http_state$entity;
        }
    }

Now, because http_entity_patterns is marked with &redef, you can change its contents from other scripts or the command line.

# zeek -Cr traces/zeek-doc/quickstart.pcap test.zeek
Found 2 matches in the HTTP entity
Found 2 matches in the HTTP entity

In this case, we will add three patterns, two of them will match. The backslash characters (\) are used to escape angled brackets, since this is invoked from a Bash shell:

# zeek -Cr traces/zeek-doc/quickstart.pcap test.zeek "http_entity_patterns+={/\<html\>/, /Also does not match/, /\<title\>/}"
Found 4 matches in the HTTP entity
Found 4 matches in the HTTP entity

We have the core functionality for this script. The full script at this point is here for your convenience.

scripts/tutorial/02-http-patterns.zeek
 1export {
 2    option max_reassembled_entity_size = 10000 &redef;
 3    const http_entity_patterns: vector of pattern = {
 4        /Will not match!/,
 5        /<body>/,
 6        /301 Moved Permanently/,
 7    } &redef;
 8}
 9
10redef record HTTP::State += {
11    entity: string &default="";
12};
13
14function num_entity_pattern_matches(state: HTTP::State): count
15    {
16    local num_matches = 0;
17    for ( _, pat in http_entity_patterns )
18        {
19        if ( pat in state$entity )
20            num_matches += 1;
21        }
22
23    return num_matches;
24    }
25
26event http_entity_data(c: connection, is_orig: bool, length: count,
27    data: string)
28    {
29    if ( c?$http_state )
30        {
31        local remaining_available = max_reassembled_entity_size - |c$http_state$entity|;
32        if ( remaining_available <= 0 )
33            return;
34
35        if ( length <= remaining_available )
36            c$http_state$entity += data;
37        else
38            c$http_state$entity += data[:remaining_available];
39        }
40    }
41
42event http_end_entity(c: connection, is_orig: bool)
43    {
44    if ( c?$http_state && |c$http_state$entity| > 0 )
45        {
46        print fmt("Found %d matches in the HTTP entity", num_entity_pattern_matches(
47            c$http_state));
48        delete c$http_state$entity;
49        }
50    }

Modifying the Logs

This script still prints information. It should, however, convey this information in Zeek’s “native” form—logs. For this, we will take two approaches: enriching the existing HTTP log, and using the notice framework to deliver notices.

Adding a Log Field

Adding a log field to Zeek is actually very simple. Since we want to add to the HTTP log, we will use the record that HTTP logs to—its Info record. First, we decide what we are logging. In this case, it’s just the number of pattern matches. So, we add that to the HTTP::Info record with redef, and mark the field with &log to make sure it gets logged:

test.zeek
    entity: string &default="";
 };
 
+redef record HTTP::Info += {
+   num_entity_matches: count &default=0 &log;
+};
+
 function num_entity_pattern_matches(state: HTTP::State): count
    {
    local num_matches = 0;

Next, in http_end_entity, set the field:

test.zeek
 
 event http_end_entity(c: connection, is_orig: bool)
    {
-   if ( c?$http_state && |c$http_state$entity| > 0 )
+   if ( c?$http_state && c?$http && |c$http_state$entity| > 0 )
        {
-       print fmt("Found %d matches in the HTTP entity", num_entity_pattern_matches(
-           c$http_state));
+       local num_entity_matches = num_entity_pattern_matches(c$http_state);
+       c$http$num_entity_matches += num_entity_matches;
        delete c$http_state$entity;
        }
    }

We’re done! Log enrichment itself is simple—add the field to the correct record. However, there are more considerations when making a robust script. For example, there can be multiple entities for a given HTTP request, so this script simply appends the matches to the previous value.

Now we can just run the script on the quickstart pcap and check the log:

# zeek -r traces/zeek-doc/quickstart.pcap test.zeek
# cat http.log | zeek-cut -m num_entity_matches
num_entity_matches
2
2

We see the matches were logged!

Generating a Notice

Zeek also offers notices for various scenarios. These are outlined in the Notice framework section. These are useful if there is some scenario users may want to be notified about, like brute forcing passwords. Notices can then be configured to take a specific action, like send an email when it is generated. In this case, we will simply use it to raise a notice when a certain threshold of matches are met.

To do this, first redef the Notice::Type with an extra value:

test.zeek
    num_entity_matches: count &default=0 &log;
 };
 
+redef enum Notice::Type += {
+   Entity_Pattern_Threshold,
+};
+
 function num_entity_pattern_matches(state: HTTP::State): count
    {
    local num_matches = 0;

Then, add another &redef option for this threshold, still in the export block:

test.zeek
        /<body>/,
        /301 Moved Permanently/,
    } &redef;
+   option pattern_threshold = 5 &redef;
 }
 
 redef record HTTP::State += {

Finally, we can test if this threshold was exceeded in http_end_entity:

test.zeek
        {
        local num_entity_matches = num_entity_pattern_matches(c$http_state);
        c$http$num_entity_matches += num_entity_matches;
+
+       if ( num_entity_matches >= pattern_threshold )
+           NOTICE([$note=Entity_Pattern_Threshold, $msg=fmt(
+               "Found %d pattern matches in HTTP entity.",
+               num_entity_matches), $id=c$id, $identifier=cat(
+               num_entity_matches, c$id$orig_h, c$id$resp_h)]);
+
        delete c$http_state$entity;
        }
    }

This threshold only applies to a single entity, so if there are multiple entities, each may exceed it.

Notices will, by default, get logged in notice.log. You will notice that no notice log exists when executed as-is:

# zeek test.zeek -r traces/zeek-doc/quickstart.pcap
# cat notice.log
cat: notice.log: No such file or directory

Note

If notice.log exists, it may be from a previous invocation. Try removing it and executing zeek again.

But, we can lower the threshold:

# zeek test.zeek -r traces/zeek-doc/quickstart.pcap pattern_threshold=1
# cat notice.log | zeek-cut -m
ts      uid     id.orig_h       id.orig_p       id.resp_h       id.resp_p       fuid    file_mime_type      file_desc       proto   note    msg     sub     src     dst     p       n  peer_descr       actions email_dest      suppress_for    remote_location.country_code    remote_location.region      remote_location.city    remote_location.latitude        remote_location.longitude
1747147647.735035       -       192.168.1.8     52917   192.0.78.212    80      -       -  -tcp     Entity_Pattern_Threshold        Found 2 pattern matches in HTTP entity. -       192.168.1.8 192.0.78.212    80      -       -       Notice::ACTION_LOG      (empty) 3600.000000--       -       -       -
1747147654.341780       -       192.168.1.8     52918   192.0.78.150    80      -       -  -tcp     Entity_Pattern_Threshold        Found 2 pattern matches in HTTP entity. -       192.168.1.8 192.0.78.150    80      -       -       Notice::ACTION_LOG      (empty) 3600.000000--       -       -       -

The notice framework is a powerful way to inform analysts of interesting events in various ways. For more information, read the Notice framework section.

With that, the script is done. Here it is in its entirety:

scripts/tutorial/03-http-logging.zeek
 1export {
 2    option max_reassembled_entity_size = 10000 &redef;
 3    const http_entity_patterns: vector of pattern = {
 4        /Will not match!/,
 5        /<body>/,
 6        /301 Moved Permanently/,
 7    } &redef;
 8    option pattern_threshold = 5 &redef;
 9}
10
11redef record HTTP::State += {
12    entity: string &default="";
13};
14
15redef record HTTP::Info += {
16    num_entity_matches: count &default=0 &log;
17};
18
19redef enum Notice::Type += {
20    Entity_Pattern_Threshold,
21};
22
23function num_entity_pattern_matches(state: HTTP::State): count
24    {
25    local num_matches = 0;
26    for ( _, pat in http_entity_patterns )
27        {
28        if ( pat in state$entity )
29            num_matches += 1;
30        }
31
32    return num_matches;
33    }
34
35event http_entity_data(c: connection, is_orig: bool, length: count,
36    data: string)
37    {
38    if ( c?$http_state )
39        {
40        local remaining_available = max_reassembled_entity_size - |c$http_state$entity|;
41        if ( remaining_available <= 0 )
42            return;
43
44        if ( length <= remaining_available )
45            c$http_state$entity += data;
46        else
47            c$http_state$entity += data[:remaining_available];
48        }
49    }
50
51event http_end_entity(c: connection, is_orig: bool)
52    {
53    if ( c?$http_state && c?$http && |c$http_state$entity| > 0 )
54        {
55        local num_entity_matches = num_entity_pattern_matches(c$http_state);
56        c$http$num_entity_matches += num_entity_matches;
57
58        if ( num_entity_matches >= pattern_threshold )
59            NOTICE([$note=Entity_Pattern_Threshold, $msg=fmt(
60                "Found %d pattern matches in HTTP entity.",
61                num_entity_matches), $id=c$id, $identifier=cat(
62                num_entity_matches, c$id$orig_h, c$id$resp_h)]);
63
64        delete c$http_state$entity;
65        }
66    }

Conclusions

We just covered many of Zeek’s language features, as well as ways to expose a new analysis’ results to users. There’s a lot more to cover:

Explore the tutorial at try.zeek.org—this is an interactive tutorial all in the web browser. It explains Zeek’s functionality with increasingly advanced scripts. That is a logical next step after this tutorial if some language features seem under-explained. You can go through the script reference section. This has detailed explanations of all of Zeek’s operators, statements, attributes, and more. If you need a deep-dive, that is the reference to use.

While this script is not necessarily production-ready, it uses Zeek in many of the same ways you would for a real detection. In it, we’ve briefly touched several of Zeek’s commonly used frameworks, and you should explore them to understand Zeek’s broader capabilities.