The Basics

Important

This section is also available in video form on YouTube.

Why Script?

We know the main output from Zeek: logs. But, Zeek has a whole architectural layer dedicated to the logic that creates those logs (and more!). That is Zeek’s scripting language.

We have already caught glimpses of this language. The Providing Script Values section covered how to adjust variables in Zeek scripts from the command line, and while covering ZeekControl we saw that the local.zeek file customizes a Zeek cluster’s configuration.

But scripting goes much further: it forms the heart of Zeek’s entire analysis engine. In Zeek scripts, you react to events that Zeek generates as it processes sniffed packets, and this event handling drives all of Zeek’s user-visible behavior. So let’s learn more about events next.

Zeek Events

Zeek’s protocol analyzers attempt to make sense of traffic as they parse into network packets. As protocols unfold on the wire, Zeek’s analyzers generate events along the way, sending them into the scripting engine for processing. Examples of such events include observing a new connection (new_connection), an HTTP request (http_request), or a completed TLS handshake (ssl_established). Most, but not all, events relate to network traffic. Two common generic events are zeek_init, which Zeek generates at startup, and zeek_done, which Zeek generates when it is about to shut down.

Zeek ships with hundreds of different types of events, each suitably named and carrying a list of typed arguments to convey context. When Zeek’s core sends an event into the script layer, scripts can handle it in many places via independent event handlers. For example, at startup Zeek only creates a single zeek_init event, but dozens of scripts handle it independently.

basics/event_multiple_handlers.zeek
1event zeek_init()
2    {
3    print "Hello, world!";
4    }
5
6event zeek_init()
7    {
8    print "Hello, everyone!";
9    }

These events just happen to be in the same file. You can guarantee ordering events with the &priority attribute:

basics/event_multiple_handlers_priority.zeek
 1# Define priority with '&priority' before the opening brace
 2event zeek_init() &priority=-5
 3    {
 4    print "This handler uses state created by other events - it should go late!";
 5    }
 6
 7# Higher priority runs first
 8event zeek_init() &priority=300
 9    {
10    print "This handler creates state - it should go early!";
11    }

You can trigger your own events from a script with the event statement:

basics/event_statement.zeek
 1# This is a brand new event type
 2event my_custom_event(a_num: count) {
 3    print fmt("My custom event got %d!", a_num);
 4}
 5
 6event zeek_init() {
 7    # 'event' can be used to immediately queue the event handler invocation.
 8    # You can even pass in values!
 9    event my_custom_event(5);
10    # The event is now queued, so it will run eventually, but this print
11    # will happen first. We are still in this event!
12    print "This happens first!";
13
14    # We cannot return any values from events, so this is invalid:
15    # local x = event my_custom_event(10);
16}

This is the only way to trigger an event from a script. Zeek does not execute these events immediately, it enqueues them for subsequent processing. Events are not functions that run now, they are interesting things that Zeek will handle later.

It’s important to remember that Zeek’s events generally don’t judge: they’re policy-neutral, simply reporting on observed activity. It’s up to scripts to build up state from processed events, infer meaning, and eventually trigger output (perhaps, but not necessarily, a detection) that informs analysts.

We will demonstrate this with a high level example. We will check if the network traffic contains any malware from the Team Cymru Malware hash registry. Should you load the full script, Zeek will produce a notice.log entry with hashes for any observed known malware, like this:

# cat notice.log | zeek-cut -m
ts      uid     id.orig_h       id.orig_p       id.resp_h       id.resp_p       fuid    file_mime_type    file_desc       proto   note    msg     sub     src     dst     p       npeer_descr       actions email_dest      suppress_for    remote_location.country_code    remote_location.region    remote_location.city    remote_location.latitude        remote_location.longitude
1362692527.080972       CLDH8f3Huq3yGIqjZ6      141.142.228.5   59856   192.150.187.43  <omitted>      text/plain      <omitted>     tcp       TeamCymruMalwareHashRegistry::Match     Malware Hash Registry Detection rate: 95%  Last seen: 2017-01-18 20:34:43 https://www.virustotal.com/gui/search/<omitted>    141.142.228.5   192.150.187.43  80      -       -       Notice::ACTION_LOG        (empty) 3600.000000     -       -       -       -       -

Zeek determined it was malware by looking up the hash in a known registry—via scripting!

basics/mhr-excerpt.zeek
# This is just an excerpt from Zeek's policy script, detect-MHR.
# It has some slight modifications.

# The file_hash event is triggered each time Zeek sees file contents.
event file_hash(f: fa_file, kind: string, hash: string)
    {
    # Ensure this is a SHA1 hash and the file hash is one we care about.
    # (match_file_types is a configuration option defined elsewhere)
    if ( kind == "sha1" && match_file_types in f$info$mime_type )
        # If it matches, we enter the lookup function.
        do_mhr_lookup(hash, Notice::create_file_info(f));
    }

Whenever Zeek sees a file transferred over a protocol, it calculates the file’s hash, and whenever it has computed a hash, it triggers the file_hash event. The body of our event handler does two things:

  1. It checks if we care about this specific file.

  2. It calls a function (do_mhr_lookup) to check the registry.

This leaves out the core of the script, but we’ve learned about one of Zeek’s most important concepts: events. To flesh the script out further, we next need to learn more about Zeek’s available data types.

Data Types

Network Types

Zeek monitors network traffic, so its scripting language makes that easy with custom types. Network types are primitive types within Zeek, so you can treat addresses, ports, and subnets as native data:

basics/types_network.zeek
 1event zeek_init()
 2    {
 3    # Setup some variables
 4    local dns_server: addr = 8.8.8.8;
 5    local internal_net: subnet = 10.0.0.0/8;
 6    local web_traffic: port = 80/tcp;
 7
 8    # Check if dns_server is part of the internal subnet
 9    if ( dns_server in internal_net )
10        print "DNS server is internal";
11    else
12        print "DNS server is external";
13
14    # We can also natively check protocols based on the port if it's
15    # a known port
16    if ( web_traffic == 443/tcp )
17        print "This is HTTPS";
18    else
19        print "This is another protocol";
20    }

These are some of Zeek’s most powerful types. They allow script writers to easily use common networking language in order to write network detections.

For more information on each, see documentation for addr, subnet, and port.

Time Types

When writing Zeek scripts, it’s also important to know when something occurred. Zeek provides time values as native types:

basics/types_time.zeek
 1event zeek_init()
 2    {
 3    # Setup some variables. This uses current_time() since time constants don't
 4    # exist in Zeek.
 5    local time_spotted: time = current_time();
 6    # You can add an interval to a time to get another time
 7    local time_spotted2 = time_spotted + 1sec;
 8    # You can also subtract two times for an interval
 9    local interval_between = time_spotted2 - time_spotted;
10    # You can add two intervals together
11    interval_between += current_time() - time_spotted;
12
13    print fmt("Time between events: %s", interval_between);
14
15    # Intervals can be used for concepts such as timeouts or for analyzing
16    # bursts of traffic.
17    local timeout_interval = 1sec;
18    if ( interval_between > timeout_interval )
19        print "Interval was timed out";
20    else
21        print "Interval was within the timeout";
22    }

The current_time call gets the “wall clock” time when it is called. The time types are useful for more, though. For example, you can cause stored state to expire after a certain time interval, or schedule events to execute some time in the future with.

For more information, see time and interval.

Container Types

If you have many elements, you can pick one of Zeek’s container types to work with it:

  • vector: Store many ordered elements

  • set: Store many unique elements with fast lookup, unordered by default

  • table: Store key-value pairs, unordered by default

Vectors are useful for maintaining ordered lists. Use them to store sequences of items, like storing mail servers for a domain in order of preference:

basics/types_vector.zeek
 1event zeek_init()
 2    {
 3    # Create a vector of mail server addresses
 4    local mail_server_ips: vector of addr = vector(10.0.0.1, 10.0.0.2);
 5
 6    # Access the first element (index starts at 0)
 7    print fmt("Primary mail server IP: %s", mail_server_ips[0]);
 8
 9    # You can add another server to the end with +=
10    mail_server_ips += 10.0.0.3;
11
12    # Loop with a 'for' loop. Vectors provide both the index and the value.
13    # We use the variable name '_' to indicate we don't care about the index.
14    for ( _, server_ip in mail_server_ips )
15        {
16        print fmt("%s is a mail server IP", server_ip);
17        }
18
19    # You can also get the length of a vector, string, and more by surrounding
20    # it with vertical bars (||)
21    print fmt("There are %d IPs in the vector", |mail_server_ips|);
22    }

Sets are useful for checking membership. They represent a unique collection of items, for example in allow/deny lists. Here, we create a set of “safe” ports that are allowed:

basics/types_set.zeek
 1event zeek_init()
 2    {
 3    # Create a set of ports, order does NOT matter. The set(...) syntax
 4    # is used to create a new set.
 5    local safe_ports: set[port] = set(80/tcp, 443/tcp, 53/udp);
 6
 7    # Check membership with 'in', or negate it with '!in'
 8    if ( 22/tcp !in safe_ports )
 9        print "SSH traffic is not on a safe port!";
10
11    # Add elements with 'add'
12    add safe_ports[22/tcp];
13
14    # Notice the '!in' changed to just 'in'
15    if ( 22/tcp in safe_ports )
16        print "Now that port is safe!";
17
18    # Remove elements with 'delete'
19    delete safe_ports[22/tcp];
20
21    # Back to '!in'
22    if ( 22/tcp !in safe_ports )
23        print "Oh, it's not safe... again";
24
25    # Loop through all elements with a simple 'for' loop. This makes a variable
26    # 'safe_port' available within the body of the loop for each iteration.
27    # Since this is an unordered set, the order the ports get printed may not
28    # be consistent.
29    for ( safe_port in safe_ports )
30        {
31        print fmt("%s is a safe port", safe_port);
32        }
33    }

Tables are useful for mapping keys to values. You may associate a specific IP address with the number of active connections, a timestamp, or a username.

Here we use a table to assign human-readable names to IP addresses:

basics/types_table.zeek
 1event zeek_init()
 2    {
 3    # Create a table that maps addresses (the key) to a string (value)
 4    local asset_names: table[addr] of string = table(
 5        [192.168.1.1] = "Router",
 6        [8.8.8.8] = "Google DNS"
 7    );
 8
 9    # Add or replace elements with an assignment
10    asset_names[1.1.1.1] = "Cloudflare DNS";
11
12    # Lookups use square brackets
13    print fmt("The device at 8.8.8.8 is: %s", asset_names[8.8.8.8]);
14
15    # We can check if the key exists with 'in'
16    if ( 192.168.1.100 in asset_names )
17        print "We know this address!";
18    else
19        print "Unknown device";
20
21    # We can even loop over all elements in key, value pairs
22    for ( known_address, name in asset_names )
23        {
24        print fmt("Address %s is %s", known_address, name);
25        }
26    }

For more information, see vector, set, and table.

Record Types

Records are just collections of named values—like a struct in C. Zeek uses records liberally in order to provide structured data and pass it amongst events. The most used record is connection, which captures Zeek’s knowledge of a given network connection.

You can get data from the record with the $ operator. The following script will use the new_connection event and print its endpoints:

basics/types_connection.zeek
1event new_connection(c: connection)
2    {
3    local originator_host: addr = c$id$orig_h;
4    local responder_host: addr = c$id$resp_h;
5
6    print fmt("Found connection between %s and %s", originator_host,
7        responder_host);
8    }

For example, here is the output for the capture file from the Quickstart:

Found connection between 192.168.1.8 and 192.0.78.212
Found connection between 192.168.1.8 and 192.0.78.150

Note

Zeek’s notions of originator and responder aim to capture the natural roles of connection endpoints given the protocol information observed. They differ from the packet-level concepts of source and destination, as well as from higher-level abstractions such as client and server.

Zeek’s protocol analyzers determine originator and responder when establishing connection state, with the sender of the initial packet usually becoming the originator and the recipient becoming the responder. However, analyzers may subsequently flip the roles if protocol semantics suggest it. For example, in the presence of packet loss the first observed packet in a DNS transaction may indicate that it is in fact the response to a missing query. Zeek’s DNS analyzer will flip the endpoint roles, making the sender of this packet the connection’s responder.

The connection record carries around the bulk of Zeek’s connection state. Scripts can add state to this record in order to piece together what they need. For example, Zeek’s HTTP scripts correlate requests and responses via the connection record. The added state is often declared as &optional, so you should use ?$ to make sure the record contains that field before accessing it. Here, we use ?$ to ensure the connection has HTTP state in the http_request event:

basics/types_connection_http.zeek
1event http_request(c: connection, method: string, original_URI: string,
2    unescaped_URI: string, version: string)
3    {
4    if ( c?$http && c$http?$uri )
5        print fmt("Found HTTP URI %s", c$http$uri);
6    }

Make sure you don’t access an optional field without checking if it exists with ?$ first, otherwise your script will encounter an expression error.

Sometimes you need to bundle your own data. This example defines an Asset record that groups IP addresses with some useful data:

basics/types_record.zeek
 1# This defines an asset with various fields, accessible with '$'
 2type Asset: record {
 3    ip: addr;
 4    owner: string;
 5    last_seen: time;
 6    is_public: bool;
 7};
 8
 9event zeek_init()
10    {
11    # Create a new record instance
12    local my_asset: Asset = Asset(
13        $ip=192.168.1.50,
14        $owner="Evan",
15        $last_seen=network_time(),
16        $is_public=T,
17    );
18
19    # Access its is_public field with '$'
20    if ( my_asset$is_public )
21        {
22        print fmt("%s's asset is public at %s", my_asset$owner, my_asset$ip);
23        }
24
25    # You can also change fields
26    my_asset$is_public = F;
27    if ( ! my_asset$is_public )
28        {
29        print fmt("%s's asset is not public anymore", my_asset$owner);
30        }
31    }

For more information, see the connection record, or record for records generally.

Standard Types

Zeek provides many of the standard types expected in a programming language:

basics/types_standard.zeek
 1event zeek_init()
 2    {
 3    # int is a signed number
 4    local x: int = +5;
 5    local y: int = -2;
 6    print fmt("result x - y: %d", x - y); # prints "result x - y: 7"
 7
 8    # count is an unsigned number
 9    local a: count = 10;
10    local b: count = 15;
11    a += b; # Add b to a, store the result in a
12    print fmt("a is: %d", a); # prints "a is: 25"
13
14    # bool can be T (for true) or F (for false)
15    local my_true: bool = T;
16    local my_false: bool = F;
17    print fmt("true and false? %s", my_true
18        && my_false); # prints "true and false? F"
19    print fmt("true or false? %s", my_true
20        || my_false); # prints "true or false? T"
21
22    # string is just some text enclosed in quotes
23    local bad_word: string = "bad";
24    local phrase: string = "this is bad.";
25    print fmt("bad? %s", bad_word in phrase); # prints "bad? T"
26
27    # pattern is a regular expression
28    local good_words = /good|great|amazing/; # any of these words will match
29    print fmt("good? %s", good_words in phrase); # prints "good? F"
30    phrase = "this is good!";
31    print fmt("good this time? %s", good_words in phrase); # prints "good this time? T"
32    }

You can use these for many of the same tasks you would use a general purpose language for.

For more information, see int, count, bool, string, and pattern.

You can read more about how these types can be used to change the program’s control flow with for and if.

Visibility and Scope

Local and Global

So far, we have kept state within events with the local keyword. When a variable is declared as local, it cannot be used outside of its scope. But, you can store state between events with globals. The following example stores how many times the new_connection event gets triggered and prints its result at the end in the zeek_done event:

basics/scope_global.zeek
 1global num_connections: count = 0;
 2
 3event new_connection(c: connection)
 4    {
 5    num_connections += 1;
 6    }
 7
 8event zeek_done()
 9    {
10    print fmt("Found %d connections", num_connections);
11    }

You may have also noticed that the loop from the vector type section use creates a server_ip variable without using local:

basics/types_vector.zeek
12    # Loop with a 'for' loop. Vectors provide both the index and the value.
13    # We use the variable name '_' to indicate we don't care about the index.
14    for ( _, server_ip in mail_server_ips )
15        {
16        print fmt("%s is a mail server IP", server_ip);
17        }

The server_ip variable is actually a local variable outside the for loop—just without the keyword. You cannot have two local variables with the same name in the function scope—therefore, you can’t later use the server_ip variable name in a new local variable.

For more information, see local and global.

Exporting

You may expose constants, types, options, and more to other scripts by putting them in export blocks. The following example defines a list of IP addresses in an allow list. If an IP address outside of those is in a new connection, then we print a warning:

basics/scope_export.zeek
 1module AllowList;
 2
 3export {
 4    const allow_list: set[addr] = [192.168.1.8];
 5}
 6
 7event new_connection(c: connection)
 8    {
 9    if ( c$id$orig_h !in allow_list )
10        print fmt("Address %s is not allowed!", c$id$orig_h);
11    if ( c$id$resp_h !in allow_list )
12        print fmt("Address %s is not allowed!", c$id$resp_h);
13    }

As-is, running this on the quickstart pcap says some addresses were not allowed:

Address 192.0.78.212 is not allowed!
Address 192.0.78.150 is not allowed!

If we can’t change the original script, we can create a new script and add the allowed IP addresses!

basics/scope_use_export.zeek
1add AllowList::allow_list[192.0.78.212];
2add AllowList::allow_list[192.0.78.150];

With this change, nothing will print—all addresses were allowed.

Redefinitions

We can change more than just native variables. The redef keyword lets you redefine a constant or type when Zeek initializes. Such changes are set in stone thereafter, and you cannot redefine anything later on at runtime. First, we will use redef to demonstrate one of the most powerful features in Zeek: redefining the connection record. Later, we will see how constructs can declare that they may be redefined.

In this example, we want to flag specific connections in the logs so that we can find them easily later. Let’s redefine the record to include a denied field:

basics/scope_redef_connection.zeek
 1module DenyList;
 2
 3export {
 4    # Make a deny list to show functionality
 5    const deny_list: set[addr] = set(192.168.1.8);
 6}
 7
 8redef record connection += {
 9    # Add a boolean field.
10    # 
11    # &default sets the default value to F. Any ``redef``ed record fields
12    #          must have this or ``&optional``
13    denied: bool &default=F;
14};
15
16event new_connection(c: connection) {
17    # The denied flag gets set if one of the IPs is in the deny list
18    c$denied = c$id$orig_h in deny_list || c$id$resp_h in deny_list;
19    print c$denied;
20}

As-is, this script will just print the flag. But, since the new_connection event is called at the beginning of the connection, future analyzers can check this denied flag and use it in their analysis! The state sticks around.

We can also modify the script slightly to log the flag in conn.log:

basics/scope_redef_connection_log.zeek
 1module DenyList;
 2
 3export {
 4    # Make a deny list to show functionality
 5    const deny_list: set[addr] = set(192.168.1.8);
 6}
 7
 8redef record Conn::Info += {
 9    # Add a boolean field.
10    # 
11    # &log means this field will be logged (in conn.log here)
12    # 
13    # &default sets the default value to F. Any ``redef``ed record fields
14    #          must have this or ``&optional``
15    denied: bool &log &default=F;
16};
17
18event new_connection(c: connection) {
19    # The denied flag gets set if one of the IPs are in the deny list
20    c$conn$denied = c$id$orig_h in deny_list || c$id$resp_h in deny_list;
21}

There are three changes here, highlighted in order:

  1. The redef redefines Conn::Info instead of connection.

  2. The denied field is marked with the &log attribute.

  3. We modify c$conn (which is a Conn::Info instance) instead of c. c is the container for the whole connection state. Inside it, c$conn is the specific record that gets written to conn.log.

Each Zeek log’s layout is defined by a record type, and by convention such records are named Info. The record type underpinning Zeek’s conn.log is called Conn::Info. So, if you want to expand what goes in conn.log, you add fields to Conn::Info. (Zeek’s logging framework handles all of the details, but we won’t go into details on it just yet.)

Note

The &log attribute tells Zeek that when this record gets logged, write this field to that log. Fields must state that they want to get logged by opting-in. Attributes in Zeek are a common way to add functionality to various language elements. You may control whether a field is optional, add an expiration timeout, and much more. For more information, see the attributes section.

When we run this on the quickstart pcap, we can see that conn.log now has a denied field:

# zeek basics/scope_redef_connection_log.zeek -Cr traces/quickstart.pcap
# cat conn.log | zeek-cut -m denied
denied
T
T

Using &redef

You cannot redefine just anything in Zeek, only things that declare themselves to be redefinable. The &redef attribute makes this possible. If you’re writing a library and want to allow users to customize parts, you may include &redef to allow extra fields in the record, to log more fields, or just to make a runtime constant configurable at startup.

basics/scope_redef_attr.zeek
 1module RedefAttr;
 2
 3export {
 4    # Here we have an indicator with just an attacker IP. Users of the library
 5    # can then add their own fields.
 6    type Indicator: record {
 7        attacker_ip: addr;
 8    } &redef;
 9
10    # You can use 'option' in order to provide config "knobs" for users.
11    # These cannot be changed when the script is executing, but can via
12    # redef.
13    option hostname: string = "host-1";
14}
15
16# Users of the library can add fields via redef
17redef record RedefAttr::Indicator += {
18    ticket_num: count &default=0;
19};
20
21# They can also change the option value
22redef RedefAttr::hostname = "new-host";
23
24event zeek_init() {
25    # prints "Hostname: new-host"
26    print fmt("Hostname: %s", RedefAttr::hostname);
27
28    # Create an instance of Indicator, with our new field
29    local my_indicator: RedefAttr::Indicator = RedefAttr::Indicator(
30        $attacker_ip = 192.168.1.1,
31        $ticket_num = 42, # This field was added!
32    );
33
34    # prints "Found indicator: [attacker_ip=192.168.1.1, ticket_num=42]"
35    print fmt("Found indicator: %s", my_indicator);
36    
37}

This example is a bit contrived, but any users who load the script with @load can then customize these variables.

For more information, see redef and &redef. Also look at option and const for some more ways to customize libraries via redef.

Functions

From other programming languages, functions are exactly what you expect: you can call them to immediately execute some statements in-order. In this example, imagine you need to check if a certain connection is internal. The function helps contain the necessary logic in its own section:

basics/functions.zeek
 1# Functions have similar definitions to events, but use the 'function'
 2# keyword. You may also specify a return type after the function parameters
 3# with a colon.
 4#
 5# Syntax: function name(arg1: type1, arg2: type2): return_type
 6function is_internal(cid: conn_id): bool
 7    {
 8    # Have a dummy local subnet
 9    local internal_net: subnet = 10.0.0.0/8;
10
11    # These two if statements could be combined, for demonstration they
12    # are separate.
13    if ( cid$orig_h in internal_net )
14        return T;
15
16    if ( cid$resp_h in internal_net )
17        return T;
18
19    # Neither case matched, so return false.
20    return F;
21    }
22
23event new_connection(c: connection)
24    {
25    # Pass c$id, not the connection, so that it is a conn_id
26    if ( ! is_internal(c$id) )
27        print fmt("Connection between %s and %s is not internal!",
28            c$id$orig_h, c$id$resp_h);
29    }

You may also use a function to modify container values. In this example, we modify host within a separate function:

basics/functions_pass_by_reference.zeek
 1type ServerHost: record {
 2    ip: addr;
 3    scanned_count: count;
 4    last_seen: time;
 5};
 6
 7# Updates the scan count, as well as updating when it was last seen.
 8function update_scan_count(host: ServerHost)
 9    {
10    host$scanned_count += 1;
11    host$last_seen = network_time();
12    }
13
14event zeek_init()
15    {
16    local host: ServerHost = ServerHost(
17        $ip = 192.168.1.10,
18        $scanned_count = 0,
19        $last_seen = network_time(),
20    );
21
22    print fmt("Before: Count=%d", host$scanned_count); # prints "Before: Count=0"
23
24    # Pass the record into update_scan_count.
25    # Zeek passes records by *reference*, so the function modifies the original.
26    update_scan_count(host);
27    update_scan_count(host);
28
29    # Notice that the count was changed!
30    print fmt("After: Count=%d", host$scanned_count); # prints "After: Count=2"
31    }

The most important part here is that only certain types in Zeek are “pass by reference”. You pass aggregate types like a table or record into a function by reference, so the function may modify their values. But, if you pass a count, then the function will modify a copy, not the original. Try modifying the above script to pass host$scanned_count in by value and see that it doesn’t get updated.

For more information, see function.

Async Functions

Note

Asynchronous functions are a relatively advanced concept, but important to understand the detection script from the beginning.

Some functions may take some time to complete, so Zeek should not wait for it to complete before continuing with its execution. Zeek provides a when keyword in order to wait for that result, then make it available when it’s ready. In this example, we use when in order to lookup the DNS TXT record for www.zeek.org:

basics/functions_async.zeek
 1event zeek_init()
 2    {
 3    # Prints will probably happen in numerical order
 4    print "1. Requesting DNS...";
 5
 6    # 'when' handles waiting for lookup_hostname_txt to finish.
 7    # The code inside curly braces executes when it is complete.
 8    when ( local result = lookup_hostname_txt("www.zeek.org") )
 9        {
10        print fmt("3. Found DNS result: %s", result);
11        }
12    # You could optionally add a "timeout" here, too.
13
14    # This code will execute immediately, it will not wait for the result
15    # from lookup_hostname_txt
16    print "2. Request sent, moving on";
17    }

You don’t have to understand the specifics here. If a function is “asynchronous” then you must use when in order to wait for its result without blocking Zeek’s execution. If you removed the when in the previous example, Zeek will error:

error in ./functions_async.zeek, line 8: lookup_hostname_txt() can only be called inside a when-condition (lookup_hostname_txt(www.zeek.org))

For more information, see when.

Understanding a Real Script

We now have the tools to understand the detect-mhr script in its entirety. At the beginning, we only showed the file_hash event handler. The logic for the event was mostly within do_mhr_lookup, which is a function call. Here is that function in its entirety, then we will go through the entire script and explain each part:

basics/detect-MHR.zeek
38function do_mhr_lookup(hash: string, fi: Notice::FileInfo)
39    {
40    local hash_domain = fmt("%s.malware.hash.cymru.com", hash);
41
42    when [hash, fi, hash_domain] ( local MHR_result = lookup_hostname_txt(hash_domain) )
43        {
44        # Data is returned as "<dateFirstDetected> <detectionRate>"
45        local MHR_answer = split_string1(MHR_result, / /);
46
47        if ( |MHR_answer| == 2 )
48            {
49            local mhr_detect_rate = to_count(MHR_answer[1]);
50
51            if ( mhr_detect_rate >= notice_threshold )
52                {
53                local mhr_first_detected = double_to_time(to_double(MHR_answer[0]));
54                local readable_first_detected = strftime("%Y-%m-%d %H:%M:%S", mhr_first_detected);
55                local message = fmt("Malware Hash Registry Detection rate: %d%%  Last seen: %s", mhr_detect_rate, readable_first_detected);
56                local virustotal_url = fmt(match_sub_url, hash);
57                # We don't have the full fa_file record here in order to
58                # avoid the "when" statement cloning it (expensive!).
59                local n: Notice::Info = Notice::Info($note=Match, $msg=message, $sub=virustotal_url);
60                Notice::populate_file_info2(fi, n);
61                NOTICE(n);
62                }
63            }
64        }
65    }

First, the function itself takes the hash (provided from file_hash) and a Notice::FileInfo:

basics/detect-MHR.zeek
38function do_mhr_lookup(hash: string, fi: Notice::FileInfo)

Then, we declare a local variable that holds the URL we look up:

basics/detect-MHR.zeek
40    local hash_domain = fmt("%s.malware.hash.cymru.com", hash);

This variable is used in the when statement to look it up asynchronously:

basics/detect-MHR.zeek
42    when [hash, fi, hash_domain] ( local MHR_result = lookup_hostname_txt(hash_domain) )

The when statement has an extra section here, within square brackets []. That specifies that the block can use hash, fi, and hash_domain from the outer do_mhr_lookup function. Without it, we could not later use hash in the when block. Because the code inside {} runs later (potentially after the function has finished), Zeek needs to copy these variables so that they are alive within the when block.

Next, we use the result within the when block in order to check the data:

basics/detect-MHR.zeek
44        # Data is returned as "<dateFirstDetected> <detectionRate>"
45        local MHR_answer = split_string1(MHR_result, / /);
46
47        if ( |MHR_answer| == 2 )
48            {
49            local mhr_detect_rate = to_count(MHR_answer[1]);

The data in MHR_answer is just the result from the DNS lookup, split at a space. It is a vector, from split_string1. If the answer has two elements, that means the split was successful, so we can move on with the logic.

We then convert the string in MHR_answer[1] (the second element of the vector of string) into a count and put it into mhr_detect_rate.

Now, we check if that detection rate is high enough to trigger a notice:

basics/detect-MHR.zeek
51            if ( mhr_detect_rate >= notice_threshold )

This is using some notice_threshold declared as an option in the export block above, so users may configure its value.

If it’s above the threshold, we have decided to trigger a notice. This uses Zeek’s notice framework, but most of the concepts should be pretty familiar. In this instance, we are mostly just manipulating the string to make the notice human-readable:

basics/detect-MHR.zeek
53                local mhr_first_detected = double_to_time(to_double(MHR_answer[0]));
54                local readable_first_detected = strftime("%Y-%m-%d %H:%M:%S", mhr_first_detected);
55                local message = fmt("Malware Hash Registry Detection rate: %d%%  Last seen: %s", mhr_detect_rate, readable_first_detected);
56                local virustotal_url = fmt(match_sub_url, hash);
57                # We don't have the full fa_file record here in order to
58                # avoid the "when" statement cloning it (expensive!).
59                local n: Notice::Info = Notice::Info($note=Match, $msg=message, $sub=virustotal_url);
60                Notice::populate_file_info2(fi, n);
61                NOTICE(n);

You can read more about Zeek’s notice framework in the Notice Framework section. Note that NOTICE is just a function.

With that, we went from zero to understanding a full Zeek script. In the next section, we will build up a script from scratch, using what we learned here.