A More Complex Script
For this tutorial, we will build a script which searches for certain patterns in HTTP entities. These will be in a list of “interesting patterns” that the user can provide. Then, we will augment the HTTP log with the number of matches. This particular script will be very slow, so not a production-level analysis, but it will help show many of the core principles of Zeek scripts and augmenting logs.
Recall that Zeek’s scripting language is event-based. As Zeek processes network traffic, it triggers events. When making a script, the author has to decide which events to react to. For this case, we care about HTTP “entities”: the body of HTTP requests and responses.
We can find the corresponding event by looking through the HTTP protocol
documentation. In this case, we care about HTTP entities, so the
http_entity_data event looks promising. This event
provides a string containing the entity data. Its signature
is:
event http_entity_data(c: connection, is_orig: bool, length: count, data: string)
With this information we can see what entities might look like. Users can use the
print statement in order to print a given object. In this case,
let’s print the data directly:
test.zeekevent http_entity_data(c: connection, is_orig: bool, length: count,
data: string)
{
print data;
}
Note
This (and many other programming tutorials) use printing in order to
demonstrate functionality. However, it’s important to note that in Zeek
print is almost entirely a tool for debugging. Production-grade scripts
should use other tools such as logging, the notice framework, or
Zeek’s reporter facility
in order to convey information.
Save the above in a file test.zeek and invoke Zeek on the quickstart pcap:
# zeek -r traces/zeek-doc/quickstart.pcap test.zeek
This should print the result from trying to access zeek.org via
HTTP.
Note
In order to keep the tutorial consistent, the examples use a capture file. But, in this case, you can test it with live traffic. To do so, first start running Zeek on a network interface:
# zeek -C -i eth0 test.zeek
listening on eth0
Then, open another terminal in the container from the host:
$ docker exec -it zeek-tutorial /bin/bash
This prompt will be used to generate traffic for Zeek. Now, curl in
the new terminal session:
# curl example.com
<!doctype html>...
Both windows should print HTML content. You can exit the previous Zeek invocation with Ctrl+C.
Now, try the same thing, except change the Zeek invocation to include a
redefinition for http_entity_data_delivery_size:
# zeek -r traces/zeek-doc/quickstart.pcap test.zeek http_entity_data_delivery_size=10
Zeek’s output will look different—namely, every 10 bytes, there should
be a newline. The http_entity_data event gets called in
batches for large entities, so Zeek doesn’t have to buffer up the entity
in its entirety. Therefore, we must reassemble the complete data
before matching patterns on the entity, just in case the pattern spans
over multiple events. That will be the first step.
Reassembling HTTP Entities
Thankfully, Zeek provides a convenient way to store state between event
calls within the same connection: The connection record!
Many protocols append a record to the connection record in order to
store connection state, either for logging or simply tracking something. For
HTTP, this record is the HTTP::State record. The name
State is convention for protocols which must maintain state for
multiple requests or responses. Not only does this store information
that the analyzer uses, we can also append our own fields to it for
various purposes. We will use the redef keyword for this.
Above the http_entity_data event, let’s add a string to keep track
of the entity data we’ve seen so far:
test.zeek+redef record HTTP::State += {
+ entity: string &default="";
+};
+
event http_entity_data(c: connection, is_orig: bool, length: count,
data: string)
{
This statement will take the HTTP::State record mentioned before and
add a field to it. When fields get added, they must have either
&default (which specifies the default value) or &optional (which
means you don’t need to initialize the field if you don’t want to).
Note
To see why these are needed, consider pre-existing code that creates an
HTTP::State instance: it wasn’t written with awareness of the new field,
so Zeek wouldn’t know what value to assign it. Either of the attributes
provides a way out.
In this case, we have a simple default that we can use to “build up” the
entity, so we use &default. The default entity value gets
created whenever the HTTP::State record is created by the HTTP
analyzer. The HTTP analyzer doesn’t need to know that we just appended a
field to its record.
Then, we can modify the event handler to add the data to this for each event:
test.zeek event http_entity_data(c: connection, is_orig: bool, length: count,
data: string)
{
- print data;
+ c$http_state$entity += data;
+ print c$http_state$entity;
}
Inside the event, we have two new statements. The first is where most of
the magic happens. For Zeek scripting, the $ separates field values.
This is often . in other languages (like my_class.my_field). We
then use the += operator to concatenate the data string to
what’s in that field.
The other key here is that connection object. The connection record
(that is, the first argument to the event) carries around state for the
connection. Many protocols will use redef to add extra fields
associated with that protocol—in this case, the HTTP analyzer adds
both an HTTP::Info and HTTP::State field. You can see which
fields an analyzer adds to the connection object in the
“redefinitions” section in the script’s documentation—such as here for HTTP. You can see from
that section that the HTTP analyzer adds a variable http_state with
type HTTP::State to the connection record—thus, we can use it!
Before we do so, we need to ensure that the c$http_state field exists before
we use it, since its presence is optional. Using an optional field that’s absent
would be a runtime error:
expression error in ./test.zeek, line 7: field value missing (c$http_state)
Therefore, we should wrap anything that uses http_state with a field
value existence check with ?$:
test.zeek event http_entity_data(c: connection, is_orig: bool, length: count,
data: string)
{
- c$http_state$entity += data;
- print c$http_state$entity;
+ if ( c?$http_state )
+ {
+ c$http_state$entity += data;
+ print c$http_state$entity;
+ }
}
For every update, this will print the accumulated entity up to that point. If the entity data is split over multiple event invocations, this will print an increasingly larger entity chunks.
For testing, try deleting the connection record`s http_state before
the if statement. Nothing should print, since you check the
existence of that optional field before printing.
It’d be better to print the entity only once, when complete. For this, we can use the
http_end_entity event. Remove the print in
http_entity_data, and move it to the http_end_entity event:
test.zeek if ( c?$http_state )
{
c$http_state$entity += data;
+ }
+ }
+
+event http_end_entity(c: connection, is_orig: bool)
+ {
+ if ( c?$http_state && |c$http_state$entity| > 0 )
+ {
print c$http_state$entity;
+ delete c$http_state$entity;
}
}
Now, it will only print once—at the end of an entity. We also delete the entity here, since it’s assumed entities can’t be nested, so we’re done with it. If you care for nested entities, this would not be sufficient.
There is one more caveat. This gives theoretically unbounded state
growth, as entity has no upper bound. We should introduce an upper
bound that users can configure. This is easy with redefineable options!
First, we declare the option at the top of the file in an export
block:
test.zeek+export {
+ option max_reassembled_entity_size = 10000 &redef;
+}
+
redef record HTTP::State += {
entity: string &default="";
};
Note
Zeek has two main types for numbers: int (if it can be negative)
and count (if it cannot be negative). The
max_reassembled_entity_size is an int—but it should not be
negative! This makes Zeek understand that the result of any
calculations using this number may also be negative. Thus, later,
when we subtract another count, this number may be negative. If
it were a count, there is potential for that result to
“underflow” and become a very large number instead—which would be a
bug.
For more information, see the count documentation.
Also note, options can be changed, but only through specific
mechanisms. See the option declaration documentation
for more information.
Then, we want to reach exactly that entity size, but never exceed it.
You can use |...| around a string to get its size, like
|c$http_state$entity| will get the length of the string in that
field. You can do the same to get the size of most containers, like a
vector. If we subtract it from max_reassembled_entity_size, that
should be the remaining length:
test.zeek {
if ( c?$http_state )
{
+ local remaining_available = max_reassembled_entity_size - |c$http_state$entity|;
+ if ( remaining_available <= 0 )
+ return;
+
c$http_state$entity += data;
}
}
The local keyword just means that remaining_available will not
be usable outside of the current scope—which will be the if block.
Next, we will just decide how much of data to add depending on
length:
test.zeek if ( remaining_available <= 0 )
return;
- c$http_state$entity += data;
+ if ( length <= remaining_available )
+ c$http_state$entity += data;
+ else
+ c$http_state$entity += data[:remaining_available];
}
}
Where the subscript operator (in data[:remaining_available]) allows
extracting only the remaining available data if we can only hold part of
it.
The full script at this point is here for your convenience:
scripts/tutorial/01-http-entities.zeek 1export {
2 option max_reassembled_entity_size = 10000 &redef;
3}
4
5redef record HTTP::State += {
6 entity: string &default="";
7};
8
9event http_entity_data(c: connection, is_orig: bool, length: count,
10 data: string)
11 {
12 if ( c?$http_state )
13 {
14 local remaining_available = max_reassembled_entity_size - |c$http_state$entity|;
15 if ( remaining_available <= 0 )
16 return;
17
18 if ( length <= remaining_available )
19 c$http_state$entity += data;
20 else
21 c$http_state$entity += data[:remaining_available];
22 }
23 }
24
25event http_end_entity(c: connection, is_orig: bool)
26 {
27 if ( c?$http_state && |c$http_state$entity| > 0 )
28 {
29 print c$http_state$entity;
30 delete c$http_state$entity;
31 }
32 }
Searching for Patterns
Now, we have all of the data in a given entity stored in
c$http_state$entity. We may want to examine that reassembled data
for certain patterns. Then, just for completeness, we can log how many
of those patterns matched entities in the HTTP connection.
Patterns in Zeek are built on regular expressions—they can be used to
find matches within a larger string. They are enclosed by forward
slashes (/). You can read more about them from the
pattern documentation.
We want to find specific strings within the HTTP entity, so this is
perfect. First, let’s see how you would search for a pattern in HTTP
traffic. In http_end_entity we print the entity, let’s change that
to print if some pattern matched:
test.zeek {
if ( c?$http_state && |c$http_state$entity| > 0 )
{
- print c$http_state$entity;
+ local pat = /Will not match!/;
+ print fmt("Did the pattern '%s' match? %s", pat, pat in c$http_state$entity);
delete c$http_state$entity;
}
}
This uses fmt in order to print readable results. See that
BIF’s documentation for more information, but it allows similar format
strings to printf in C.
Running this on the quickstart pcap will yield no matches:
# zeek -r traces/zeek-doc/quickstart.pcap test.zeek
Did the pattern '/^?(Will not match!)$?/' match? F
Did the pattern '/^?(Will not match!)$?/' match? F
Note that in Zeek, true and false are represented by single-character
T and F respectively.
We can change this script to actually match, say with a <body> tag:
# zeek -r traces/zeek-doc/quickstart.pcap test.zeek
Did the pattern '/^?(<body>)$?/' match? T
Did the pattern '/^?(<body>)$?/' match? T
At this point, we need:
A list of user-provided patterns to match
How many of those patterns matched the entity content
The first is easy, it’s similar to the max_reassembled_entity_size
from before. Just put a vector in the export block with &redef:
test.zeek export {
option max_reassembled_entity_size = 10000 &redef;
+ const http_entity_patterns: vector of pattern = {
+ /Will not match!/,
+ /<body>/,
+ /301 Moved Permanently/,
+ } &redef;
}
redef record HTTP::State += {
Then part 2 can be done in a function that takes the content and returns
the number of patterns that matched. Functions are defined similar to
events, just with the function keyword. These have to be explicitly
called in your Zeek scripts. Here is the function signature:
function num_entity_pattern_matches(state: HTTP::State): count {
This function takes in a single HTTP::State as a parameter
and returns a count—simple enough. One important point is that this
function’s parameter is not the entity itself, but the HTTP state. This
is because atomic values (like counts, addresses, and strings) are
passed by value in Zeek. That means if the entity was passed in
directly, it would get copied, which could be very expensive. Instead,
we pass in the HTTP state. Types like records or tables are passed by
reference, so no copy is necessary.
Now, its implementation simply loops through the patterns in
http_entity_patterns and counts the matches:
test.zeek entity: string &default="";
};
+function num_entity_pattern_matches(state: HTTP::State): count
+ {
+ local num_matches = 0;
+ for ( _, pat in http_entity_patterns )
+ {
+ if ( pat in state$entity )
+ num_matches += 1;
+ }
+
+ return num_matches;
+ }
+
event http_entity_data(c: connection, is_orig: bool, length: count,
data: string)
{
There is one common trip-up in this function: for loops. In Zeek
scripts, using a for loop often loops over the indexes rather than
elements. That’s what the _ in the for loop is: that’s an unused
index, which would often just count up from 0 each iteration. You can
add a second optional parameter, named pat in the function, which
contains the actual elements.
Note
Zeek’s native types are quite powerful on their own. For example, this case could be done in a similar fashion with a table of patterns:
function num_entity_pattern_matches(state: HTTP::State): count
{
local entity_patterns: table[pattern] of count = {
[/.*Will not match!.*/s] = 1,
[/.*<body>.*/s] = 2,
[/.*301 Moved Permanently.*/s] = 3,
};
return |entity_patterns[state$entity]|;
}
This is a more efficient way to match a large number of known
patterns. However, there are a few extra considerations that are
outside of the scope here. For example, since we have newlines in the
HTTP entities, a s character is necessary at the end of each
pattern (see the pattern documentation for more
information).
See the table section for more interesting ways to use
tables, including another “special lookup” for subnets and addresses.
Finally, call this new function when we finish collecting entity data:
test.zeek {
if ( c?$http_state && |c$http_state$entity| > 0 )
{
- local pat = /Will not match!/;
- print fmt("Did the pattern '%s' match? %s", pat, pat in c$http_state$entity);
+ print fmt("Found %d matches in the HTTP entity", num_entity_pattern_matches(
+ c$http_state));
delete c$http_state$entity;
}
}
Now, because http_entity_patterns is marked with &redef, you can
change its contents from other scripts or the command line.
# zeek -Cr traces/zeek-doc/quickstart.pcap test.zeek
Found 2 matches in the HTTP entity
Found 2 matches in the HTTP entity
In this case, we will add three patterns, two of them will match. The
backslash characters (\) are used to escape angled brackets, since
this is invoked from a Bash shell:
# zeek -Cr traces/zeek-doc/quickstart.pcap test.zeek "http_entity_patterns+={/\<html\>/, /Also does not match/, /\<title\>/}"
Found 4 matches in the HTTP entity
Found 4 matches in the HTTP entity
We have the core functionality for this script. The full script at this point is here for your convenience.
scripts/tutorial/02-http-patterns.zeek 1export {
2 option max_reassembled_entity_size = 10000 &redef;
3 const http_entity_patterns: vector of pattern = {
4 /Will not match!/,
5 /<body>/,
6 /301 Moved Permanently/,
7 } &redef;
8}
9
10redef record HTTP::State += {
11 entity: string &default="";
12};
13
14function num_entity_pattern_matches(state: HTTP::State): count
15 {
16 local num_matches = 0;
17 for ( _, pat in http_entity_patterns )
18 {
19 if ( pat in state$entity )
20 num_matches += 1;
21 }
22
23 return num_matches;
24 }
25
26event http_entity_data(c: connection, is_orig: bool, length: count,
27 data: string)
28 {
29 if ( c?$http_state )
30 {
31 local remaining_available = max_reassembled_entity_size - |c$http_state$entity|;
32 if ( remaining_available <= 0 )
33 return;
34
35 if ( length <= remaining_available )
36 c$http_state$entity += data;
37 else
38 c$http_state$entity += data[:remaining_available];
39 }
40 }
41
42event http_end_entity(c: connection, is_orig: bool)
43 {
44 if ( c?$http_state && |c$http_state$entity| > 0 )
45 {
46 print fmt("Found %d matches in the HTTP entity", num_entity_pattern_matches(
47 c$http_state));
48 delete c$http_state$entity;
49 }
50 }
Modifying the Logs
This script still prints information. It should, however, convey this information in Zeek’s “native” form—logs. For this, we will take two approaches: enriching the existing HTTP log, and using the notice framework to deliver notices.
Adding a Log Field
Adding a log field to Zeek is actually very simple. Since we want to add
to the HTTP log, we will use the record that HTTP logs to—its Info
record. First, we decide what we are logging. In this case, it’s just
the number of pattern matches. So, we add that to the
HTTP::Info record with redef, and mark the field with
&log to make sure it gets logged:
test.zeek entity: string &default="";
};
+redef record HTTP::Info += {
+ num_entity_matches: count &default=0 &log;
+};
+
function num_entity_pattern_matches(state: HTTP::State): count
{
local num_matches = 0;
Next, in http_end_entity, set the field:
test.zeek
event http_end_entity(c: connection, is_orig: bool)
{
- if ( c?$http_state && |c$http_state$entity| > 0 )
+ if ( c?$http_state && c?$http && |c$http_state$entity| > 0 )
{
- print fmt("Found %d matches in the HTTP entity", num_entity_pattern_matches(
- c$http_state));
+ local num_entity_matches = num_entity_pattern_matches(c$http_state);
+ c$http$num_entity_matches += num_entity_matches;
delete c$http_state$entity;
}
}
We’re done! Log enrichment itself is simple—add the field to the correct record. However, there are more considerations when making a robust script. For example, there can be multiple entities for a given HTTP request, so this script simply appends the matches to the previous value.
Now we can just run the script on the quickstart pcap and check the log:
# zeek -r traces/zeek-doc/quickstart.pcap test.zeek
# cat http.log | zeek-cut -m num_entity_matches
num_entity_matches
2
2
We see the matches were logged!
Generating a Notice
Zeek also offers notices for various scenarios. These are outlined in the Notice framework section. These are useful if there is some scenario users may want to be notified about, like brute forcing passwords. Notices can then be configured to take a specific action, like send an email when it is generated. In this case, we will simply use it to raise a notice when a certain threshold of matches are met.
To do this, first redef the Notice::Type with an extra value:
test.zeek num_entity_matches: count &default=0 &log;
};
+redef enum Notice::Type += {
+ Entity_Pattern_Threshold,
+};
+
function num_entity_pattern_matches(state: HTTP::State): count
{
local num_matches = 0;
Then, add another &redef option for this threshold, still in the
export block:
test.zeek /<body>/,
/301 Moved Permanently/,
} &redef;
+ option pattern_threshold = 5 &redef;
}
redef record HTTP::State += {
Finally, we can test if this threshold was exceeded in
http_end_entity:
test.zeek {
local num_entity_matches = num_entity_pattern_matches(c$http_state);
c$http$num_entity_matches += num_entity_matches;
+
+ if ( num_entity_matches >= pattern_threshold )
+ NOTICE([$note=Entity_Pattern_Threshold, $msg=fmt(
+ "Found %d pattern matches in HTTP entity.",
+ num_entity_matches), $id=c$id, $identifier=cat(
+ num_entity_matches, c$id$orig_h, c$id$resp_h)]);
+
delete c$http_state$entity;
}
}
This threshold only applies to a single entity, so if there are multiple entities, each may exceed it.
Notices will, by default, get logged in notice.log. You will notice
that no notice log exists when executed as-is:
# zeek test.zeek -r traces/zeek-doc/quickstart.pcap
# cat notice.log
cat: notice.log: No such file or directory
Note
If notice.log exists, it may be from a previous invocation. Try
removing it and executing zeek again.
But, we can lower the threshold:
# zeek test.zeek -r traces/zeek-doc/quickstart.pcap pattern_threshold=1
# cat notice.log | zeek-cut -m
ts uid id.orig_h id.orig_p id.resp_h id.resp_p fuid file_mime_type file_desc proto note msg sub src dst p n peer_descr actions email_dest suppress_for remote_location.country_code remote_location.region remote_location.city remote_location.latitude remote_location.longitude
1747147647.735035 - 192.168.1.8 52917 192.0.78.212 80 - - -tcp Entity_Pattern_Threshold Found 2 pattern matches in HTTP entity. - 192.168.1.8 192.0.78.212 80 - - Notice::ACTION_LOG (empty) 3600.000000-- - - -
1747147654.341780 - 192.168.1.8 52918 192.0.78.150 80 - - -tcp Entity_Pattern_Threshold Found 2 pattern matches in HTTP entity. - 192.168.1.8 192.0.78.150 80 - - Notice::ACTION_LOG (empty) 3600.000000-- - - -
The notice framework is a powerful way to inform analysts of interesting events in various ways. For more information, read the Notice framework section.
With that, the script is done. Here it is in its entirety:
scripts/tutorial/03-http-logging.zeek 1export {
2 option max_reassembled_entity_size = 10000 &redef;
3 const http_entity_patterns: vector of pattern = {
4 /Will not match!/,
5 /<body>/,
6 /301 Moved Permanently/,
7 } &redef;
8 option pattern_threshold = 5 &redef;
9}
10
11redef record HTTP::State += {
12 entity: string &default="";
13};
14
15redef record HTTP::Info += {
16 num_entity_matches: count &default=0 &log;
17};
18
19redef enum Notice::Type += {
20 Entity_Pattern_Threshold,
21};
22
23function num_entity_pattern_matches(state: HTTP::State): count
24 {
25 local num_matches = 0;
26 for ( _, pat in http_entity_patterns )
27 {
28 if ( pat in state$entity )
29 num_matches += 1;
30 }
31
32 return num_matches;
33 }
34
35event http_entity_data(c: connection, is_orig: bool, length: count,
36 data: string)
37 {
38 if ( c?$http_state )
39 {
40 local remaining_available = max_reassembled_entity_size - |c$http_state$entity|;
41 if ( remaining_available <= 0 )
42 return;
43
44 if ( length <= remaining_available )
45 c$http_state$entity += data;
46 else
47 c$http_state$entity += data[:remaining_available];
48 }
49 }
50
51event http_end_entity(c: connection, is_orig: bool)
52 {
53 if ( c?$http_state && c?$http && |c$http_state$entity| > 0 )
54 {
55 local num_entity_matches = num_entity_pattern_matches(c$http_state);
56 c$http$num_entity_matches += num_entity_matches;
57
58 if ( num_entity_matches >= pattern_threshold )
59 NOTICE([$note=Entity_Pattern_Threshold, $msg=fmt(
60 "Found %d pattern matches in HTTP entity.",
61 num_entity_matches), $id=c$id, $identifier=cat(
62 num_entity_matches, c$id$orig_h, c$id$resp_h)]);
63
64 delete c$http_state$entity;
65 }
66 }
Conclusions
We just covered many of Zeek’s language features, as well as ways to expose a new analysis’ results to users. There’s a lot more to cover:
Explore the tutorial at try.zeek.org—this is an interactive tutorial all in the web browser. It explains Zeek’s functionality with increasingly advanced scripts. That is a logical next step after this tutorial if some language features seem under-explained. You can go through the script reference section. This has detailed explanations of all of Zeek’s operators, statements, attributes, and more. If you need a deep-dive, that is the reference to use.
While this script is not necessarily production-ready, it uses Zeek in many of the same ways you would for a real detection. In it, we’ve briefly touched several of Zeek’s commonly used frameworks, and you should explore them to understand Zeek’s broader capabilities.