The Basics

Important

This section is also available in video form on YouTube.

Why Script?

We know the main output from Zeek: logs. But, Zeek has a whole architectural layer dedicated to the logic that creates those logs (and more!). That is Zeek’s scripting language.

We have already caught glimpses of this language. The Providing Script Values section covered how to adjust variables in Zeek scripts from the command line, and while covering ZeekControl we saw that the local.zeek file customizes a Zeek cluster’s configuration.

But scripting goes much further: it forms the heart of Zeek’s entire analysis engine. In Zeek scripts, you react to events that Zeek generates as it processes sniffed packets, and this event handling drives all of Zeek’s user-visible behavior. So let’s learn more about events next.

Zeek Events

Zeek’s protocol analyzers attempt to make sense of traffic as they parse into network packets. As protocols unfold on the wire, Zeek’s analyzers generate events along the way, sending them into the scripting engine for processing. Examples of such events include observing a new connection (new_connection), an HTTP request (http_request), or a completed TLS handshake (ssl_established). Most, but not all, events relate to network traffic. Two common generic events are zeek_init, which Zeek generates at startup, and zeek_done, which Zeek generates when it is about to shut down.

Zeek ships with hundreds of different types of events, each suitably named and carrying a list of typed arguments to convey context. When Zeek’s core sends an event into the script layer, scripts can handle it in many places via independent event handlers. For example, at startup Zeek only creates a single zeek_init event, but dozens of scripts handle it independently.

basics/event_multiple_handlers.zeek

event zeek_init()
    {
    print "Hello, world!";
    }

event zeek_init()
    {
    print "Hello, everyone!";
    }

These events just happen to be in the same file. You can guarantee ordering events with the &priority attribute:

basics/event_multiple_handlers_priority.zeek

# Define priority with '&priority' before the opening brace
event zeek_init() &priority=-5
    {
    print "This handler uses state created by other events - it should go late!";
    }

# Higher priority runs first
event zeek_init() &priority=300
    {
    print "This handler creates state - it should go early!";
    }

You can trigger your own events from a script with the event statement:

basics/event_statement.zeek

# This is a brand new event type
event my_custom_event(a_num: count) {
    print fmt("My custom event got %d!", a_num);
}

event zeek_init() {
    # 'event' can be used to immediately queue the event handler invocation.
    # You can even pass in values!
    event my_custom_event(5);
    # The event is now queued, so it will run eventually, but this print
    # will happen first. We are still in this event!
    print "This happens first!";

    # We cannot return any values from events, so this is invalid:
    # local x = event my_custom_event(10);
}

This is the only way to trigger an event from a script. Zeek does not execute these events immediately, it enqueues them for subsequent processing. Events are not functions that run now, they are interesting things that Zeek will handle later.

It’s important to remember that Zeek’s events generally don’t judge: they’re policy-neutral, simply reporting on observed activity. It’s up to scripts to build up state from processed events, infer meaning, and eventually trigger output (perhaps, but not necessarily, a detection) that informs analysts.

We will demonstrate this with a high level example. We will check if the network traffic contains any malware from the Team Cymru Malware hash registry. Should you load the full script, Zeek will produce a notice.log entry with hashes for any observed known malware, like this:

# cat notice.log | zeek-cut -m
ts      uid     id.orig_h       id.orig_p       id.resp_h       id.resp_p       fuid    file_mime_type    file_desc       proto   note    msg     sub     src     dst     p       npeer_descr       actions email_dest      suppress_for    remote_location.country_code    remote_location.region    remote_location.city    remote_location.latitude        remote_location.longitude
1362692527.080972       CLDH8f3Huq3yGIqjZ6      141.142.228.5   59856   192.150.187.43  <omitted>      text/plain      <omitted>     tcp       TeamCymruMalwareHashRegistry::Match     Malware Hash Registry Detection rate: 95%  Last seen: 2017-01-18 20:34:43 https://www.virustotal.com/gui/search/<omitted>    141.142.228.5   192.150.187.43  80      -       -       Notice::ACTION_LOG        (empty) 3600.000000     -       -       -       -       -

Zeek determined it was malware by looking up the hash in a known registry—via scripting!

basics/mhr-excerpt.zeek

# This is just an excerpt from Zeek's policy script, detect-MHR.
# It has some slight modifications.

# The file_hash event is triggered each time Zeek sees file contents.
event file_hash(f: fa_file, kind: string, hash: string)
    {
    # Ensure this is a SHA1 hash and the file hash is one we care about.
    # (match_file_types is a configuration option defined elsewhere)
    if ( kind == "sha1" && match_file_types in f$info$mime_type )
        # If it matches, we enter the lookup function.
        do_mhr_lookup(hash, Notice::create_file_info(f));
    }

Whenever Zeek sees a file transferred over a protocol, it calculates the file’s hash, and whenever it has computed a hash, it triggers the file_hash event. The body of our event handler does two things:

It checks if we care about this specific file.
It calls a function (do_mhr_lookup) to check the registry.

This leaves out the core of the script, but we’ve learned about one of Zeek’s most important concepts: events. To flesh the script out further, we next need to learn more about Zeek’s available data types.

Data Types

Network Types

Zeek monitors network traffic, so its scripting language makes that easy with custom types. Network types are primitive types within Zeek, so you can treat addresses, ports, and subnets as native data:

basics/types_network.zeek

event zeek_init()
    {
    # Setup some variables
    local dns_server: addr = 8.8.8.8;
    local internal_net: subnet = 10.0.0.0/8;
    local web_traffic: port = 80/tcp;

    # Check if dns_server is part of the internal subnet
    if ( dns_server in internal_net )
        print "DNS server is internal";
    else
        print "DNS server is external";

    # We can also natively check protocols based on the port if it's
    # a known port
    if ( web_traffic == 443/tcp )
        print "This is HTTPS";
    else
        print "This is another protocol";
    }

These are some of Zeek’s most powerful types. They allow script writers to easily use common networking language in order to write network detections.

For more information on each, see documentation for addr, subnet, and port.

Time Types

When writing Zeek scripts, it’s also important to know when something occurred. Zeek provides time values as native types:

basics/types_time.zeek

event zeek_init()
    {
    # Setup some variables. This uses current_time() since time constants don't
    # exist in Zeek.
    local time_spotted: time = current_time();
    # You can add an interval to a time to get another time
    local time_spotted2 = time_spotted + 1sec;
    # You can also subtract two times for an interval
    local interval_between = time_spotted2 - time_spotted;
    # You can add two intervals together
    interval_between += current_time() - time_spotted;

    print fmt("Time between events: %s", interval_between);

    # Intervals can be used for concepts such as timeouts or for analyzing
    # bursts of traffic.
    local timeout_interval = 1sec;
    if ( interval_between > timeout_interval )
        print "Interval was timed out";
    else
        print "Interval was within the timeout";
    }

The current_time call gets the “wall clock” time when it is called. The time types are useful for more, though. For example, you can cause stored state to expire after a certain time interval, or schedule events to execute some time in the future with.

For more information, see time and interval.

Container Types

If you have many elements, you can pick one of Zeek’s container types to work with it:

vector: Store many ordered elements
set: Store many unique elements with fast lookup, unordered by default
table: Store key-value pairs, unordered by default

Vectors are useful for maintaining ordered lists. Use them to store sequences of items, like storing mail servers for a domain in order of preference:

basics/types_vector.zeek

event zeek_init()
    {
    # Create a vector of mail server addresses
    local mail_server_ips: vector of addr = vector(10.0.0.1, 10.0.0.2);

    # Access the first element (index starts at 0)
    print fmt("Primary mail server IP: %s", mail_server_ips[0]);

    # You can add another server to the end with +=
    mail_server_ips += 10.0.0.3;

    # Loop with a 'for' loop. Vectors provide both the index and the value.
    # We use the variable name '_' to indicate we don't care about the index.
    for ( _, server_ip in mail_server_ips )
        {
        print fmt("%s is a mail server IP", server_ip);
        }

    # You can also get the length of a vector, string, and more by surrounding
    # it with vertical bars (||)
    print fmt("There are %d IPs in the vector", |mail_server_ips|);
    }

Sets are useful for checking membership. They represent a unique collection of items, for example in allow/deny lists. Here, we create a set of “safe” ports that are allowed:

basics/types_set.zeek

event zeek_init()
    {
    # Create a set of ports, order does NOT matter. The set(...) syntax
    # is used to create a new set.
    local safe_ports: set[port] = set(80/tcp, 443/tcp, 53/udp);

    # Check membership with 'in', or negate it with '!in'
    if ( 22/tcp !in safe_ports )
        print "SSH traffic is not on a safe port!";

    # Add elements with 'add'
    add safe_ports[22/tcp];

    # Notice the '!in' changed to just 'in'
    if ( 22/tcp in safe_ports )
        print "Now that port is safe!";

    # Remove elements with 'delete'
    delete safe_ports[22/tcp];

    # Back to '!in'
    if ( 22/tcp !in safe_ports )
        print "Oh, it's not safe... again";

    # Loop through all elements with a simple 'for' loop. This makes a variable
    # 'safe_port' available within the body of the loop for each iteration.
    # Since this is an unordered set, the order the ports get printed may not
    # be consistent.
    for ( safe_port in safe_ports )
        {
        print fmt("%s is a safe port", safe_port);
        }
    }

Tables are useful for mapping keys to values. You may associate a specific IP address with the number of active connections, a timestamp, or a username.

Here we use a table to assign human-readable names to IP addresses:

basics/types_table.zeek

event zeek_init()
    {
    # Create a table that maps addresses (the key) to a string (value)
    local asset_names: table[addr] of string = table(
        [192.168.1.1] = "Router",
        [8.8.8.8] = "Google DNS"
    );

    # Add or replace elements with an assignment
    asset_names[1.1.1.1] = "Cloudflare DNS";

    # Lookups use square brackets
    print fmt("The device at 8.8.8.8 is: %s", asset_names[8.8.8.8]);

    # We can check if the key exists with 'in'
    if ( 192.168.1.100 in asset_names )
        print "We know this address!";
    else
        print "Unknown device";

    # We can even loop over all elements in key, value pairs
    for ( known_address, name in asset_names )
        {
        print fmt("Address %s is %s", known_address, name);
        }
    }

For more information, see vector, set, and table.

Record Types

Records are just collections of named values—like a struct in C. Zeek uses records liberally in order to provide structured data and pass it amongst events. The most used record is connection, which captures Zeek’s knowledge of a given network connection.

You can get data from the record with the $ operator. The following script will use the new_connection event and print its endpoints:

basics/types_connection.zeek

event new_connection(c: connection)
    {
    local originator_host: addr = c$id$orig_h;
    local responder_host: addr = c$id$resp_h;

    print fmt("Found connection between %s and %s", originator_host,
        responder_host);
    }

For example, here is the output for the capture file from the Quickstart:

Found connection between 192.168.1.8 and 192.0.78.212
Found connection between 192.168.1.8 and 192.0.78.150

Note

Zeek’s notions of originator and responder aim to capture the natural roles of connection endpoints given the protocol information observed. They differ from the packet-level concepts of source and destination, as well as from higher-level abstractions such as client and server.

Zeek’s protocol analyzers determine originator and responder when establishing connection state, with the sender of the initial packet usually becoming the originator and the recipient becoming the responder. However, analyzers may subsequently flip the roles if protocol semantics suggest it. For example, in the presence of packet loss the first observed packet in a DNS transaction may indicate that it is in fact the response to a missing query. Zeek’s DNS analyzer will flip the endpoint roles, making the sender of this packet the connection’s responder.

The connection record carries around the bulk of Zeek’s connection state. Scripts can add state to this record in order to piece together what they need. For example, Zeek’s HTTP scripts correlate requests and responses via the connection record. The added state is often declared as &optional, so you should use ?$ to make sure the record contains that field before accessing it. Here, we use ?$ to ensure the connection has HTTP state in the http_request event:

basics/types_connection_http.zeek

event http_request(c: connection, method: string, original_URI: string,
    unescaped_URI: string, version: string)
    {
    if ( c?$http && c$http?$uri )
        print fmt("Found HTTP URI %s", c$http$uri);
    }

Make sure you don’t access an optional field without checking if it exists with ?$ first, otherwise your script will encounter an expression error.

Sometimes you need to bundle your own data. This example defines an Asset record that groups IP addresses with some useful data:

basics/types_record.zeek

# This defines an asset with various fields, accessible with '$'
type Asset: record {
    ip: addr;
    owner: string;
    last_seen: time;
    is_public: bool;
};

event zeek_init()
    {
    # Create a new record instance
    local my_asset: Asset = Asset(
        $ip=192.168.1.50,
        $owner="Evan",
        $last_seen=network_time(),
        $is_public=T,
    );

    # Access its is_public field with '$'
    if ( my_asset$is_public )
        {
        print fmt("%s's asset is public at %s", my_asset$owner, my_asset$ip);
        }

    # You can also change fields
    my_asset$is_public = F;
    if ( ! my_asset$is_public )
        {
        print fmt("%s's asset is not public anymore", my_asset$owner);
        }
    }

For more information, see the connection record, or record for records generally.

Standard Types

Zeek provides many of the standard types expected in a programming language:

basics/types_standard.zeek

event zeek_init()
    {
    # int is a signed number
    local x: int = +5;
    local y: int = -2;
    print fmt("result x - y: %d", x - y); # prints "result x - y: 7"

    # count is an unsigned number
    local a: count = 10;
    local b: count = 15;
    a += b; # Add b to a, store the result in a
    print fmt("a is: %d", a); # prints "a is: 25"

    # bool can be T (for true) or F (for false)
    local my_true: bool = T;
    local my_false: bool = F;
    print fmt("true and false? %s", my_true
        && my_false); # prints "true and false? F"
    print fmt("true or false? %s", my_true
        || my_false); # prints "true or false? T"

    # string is just some text enclosed in quotes
    local bad_word: string = "bad";
    local phrase: string = "this is bad.";
    print fmt("bad? %s", bad_word in phrase); # prints "bad? T"

    # pattern is a regular expression
    local good_words = /good|great|amazing/; # any of these words will match
    print fmt("good? %s", good_words in phrase); # prints "good? F"
    phrase = "this is good!";
    print fmt("good this time? %s", good_words in phrase); # prints "good this time? T"
    }

You can use these for many of the same tasks you would use a general purpose language for.

For more information, see int, count, bool, string, and pattern.

You can read more about how these types can be used to change the program’s control flow with for and if.

Visibility and Scope

Local and Global

So far, we have kept state within events with the local keyword. When a variable is declared as local, it cannot be used outside of its scope. But, you can store state between events with globals. The following example stores how many times the new_connection event gets triggered and prints its result at the end in the zeek_done event:

basics/scope_global.zeek

global num_connections: count = 0;

event new_connection(c: connection)
    {
    num_connections += 1;
    }

event zeek_done()
    {
    print fmt("Found %d connections", num_connections);
    }

You may have also noticed that the loop from the vector type section use creates a server_ip variable without using local:

basics/types_vector.zeek

    # Loop with a 'for' loop. Vectors provide both the index and the value.
    # We use the variable name '_' to indicate we don't care about the index.
    for ( _, server_ip in mail_server_ips )
        {
        print fmt("%s is a mail server IP", server_ip);
        }

The server_ip variable is actually a local variable outside the for loop—just without the keyword. You cannot have two local variables with the same name in the function scope—therefore, you can’t later use the server_ip variable name in a new local variable.

For more information, see local and global.

Exporting

You may expose constants, types, options, and more to other scripts by putting them in export blocks. The following example defines a list of IP addresses in an allow list. If an IP address outside of those is in a new connection, then we print a warning:

basics/scope_export.zeek

module AllowList;

export {
    const allow_list: set[addr] = [192.168.1.8];
}

event new_connection(c: connection)
    {
    if ( c$id$orig_h !in allow_list )
        print fmt("Address %s is not allowed!", c$id$orig_h);
    if ( c$id$resp_h !in allow_list )
        print fmt("Address %s is not allowed!", c$id$resp_h);
    }

As-is, running this on the quickstart pcap says some addresses were not allowed:

Address 192.0.78.212 is not allowed!
Address 192.0.78.150 is not allowed!

If we can’t change the original script, we can create a new script and add the allowed IP addresses!

basics/scope_use_export.zeek

add AllowList::allow_list[192.0.78.212];
add AllowList::allow_list[192.0.78.150];

With this change, nothing will print—all addresses were allowed.

Redefinitions

We can change more than just native variables. The redef keyword lets you redefine a constant or type when Zeek initializes. Such changes are set in stone thereafter, and you cannot redefine anything later on at runtime. First, we will use redef to demonstrate one of the most powerful features in Zeek: redefining the connection record. Later, we will see how constructs can declare that they may be redefined.

In this example, we want to flag specific connections in the logs so that we can find them easily later. Let’s redefine the record to include a denied field:

basics/scope_redef_connection.zeek

module DenyList;

export {
    # Make a deny list to show functionality
    const deny_list: set[addr] = set(192.168.1.8);
}

redef record connection += {
    # Add a boolean field.
    # 
    # &default sets the default value to F. Any ``redef``ed record fields
    #          must have this or ``&optional``
    denied: bool &default=F;
};

event new_connection(c: connection) {
    # The denied flag gets set if one of the IPs is in the deny list
    c$denied = c$id$orig_h in deny_list || c$id$resp_h in deny_list;
    print c$denied;
}

As-is, this script will just print the flag. But, since the new_connection event is called at the beginning of the connection, future analyzers can check this denied flag and use it in their analysis! The state sticks around.

We can also modify the script slightly to log the flag in conn.log:

basics/scope_redef_connection_log.zeek

module DenyList;

export {
    # Make a deny list to show functionality
    const deny_list: set[addr] = set(192.168.1.8);
}

redef record Conn::Info += {
    # Add a boolean field.
    # 
    # &log means this field will be logged (in conn.log here)
    # 
    # &default sets the default value to F. Any ``redef``ed record fields
    #          must have this or ``&optional``
    denied: bool &log &default=F;
};

event new_connection(c: connection) {
    # The denied flag gets set if one of the IPs are in the deny list
    c$conn$denied = c$id$orig_h in deny_list || c$id$resp_h in deny_list;
}

There are three changes here, highlighted in order:

The redef redefines Conn::Info instead of connection.
The denied field is marked with the &log attribute.
We modify c$conn (which is a Conn::Info instance) instead of c. c is the container for the whole connection state. Inside it, c$conn is the specific record that gets written to conn.log.

Each Zeek log’s layout is defined by a record type, and by convention such records are named Info. The record type underpinning Zeek’s conn.log is called Conn::Info. So, if you want to expand what goes in conn.log, you add fields to Conn::Info. (Zeek’s logging framework handles all of the details, but we won’t go into details on it just yet.)

Note

The &log attribute tells Zeek that when this record gets logged, write this field to that log. Fields must state that they want to get logged by opting-in. Attributes in Zeek are a common way to add functionality to various language elements. You may control whether a field is optional, add an expiration timeout, and much more. For more information, see the attributes section.

When we run this on the quickstart pcap, we can see that conn.log now has a denied field:

# zeek basics/scope_redef_connection_log.zeek -Cr traces/quickstart.pcap
# cat conn.log | zeek-cut -m denied
denied
T
T

Using `&redef`

You cannot redefine just anything in Zeek, only things that declare themselves to be redefinable. The &redef attribute makes this possible. If you’re writing a library and want to allow users to customize parts, you may include &redef to allow extra fields in the record, to log more fields, or just to make a runtime constant configurable at startup.

basics/scope_redef_attr.zeek

module RedefAttr;

export {
    # Here we have an indicator with just an attacker IP. Users of the library
    # can then add their own fields.
    type Indicator: record {
        attacker_ip: addr;
    } &redef;

    # You can use 'option' in order to provide config "knobs" for users.
    # These cannot be changed when the script is executing, but can via
    # redef.
    option hostname: string = "host-1";
}

# Users of the library can add fields via redef
redef record RedefAttr::Indicator += {
    ticket_num: count &default=0;
};

# They can also change the option value
redef RedefAttr::hostname = "new-host";

event zeek_init() {
    # prints "Hostname: new-host"
    print fmt("Hostname: %s", RedefAttr::hostname);

    # Create an instance of Indicator, with our new field
    local my_indicator: RedefAttr::Indicator = RedefAttr::Indicator(
        $attacker_ip = 192.168.1.1,
        $ticket_num = 42, # This field was added!
    );

    # prints "Found indicator: [attacker_ip=192.168.1.1, ticket_num=42]"
    print fmt("Found indicator: %s", my_indicator);
    
}

This example is a bit contrived, but any users who load the script with @load can then customize these variables.

For more information, see redef and &redef. Also look at option and const for some more ways to customize libraries via redef.

Functions

From other programming languages, functions are exactly what you expect: you can call them to immediately execute some statements in-order. In this example, imagine you need to check if a certain connection is internal. The function helps contain the necessary logic in its own section:

basics/functions.zeek

# Functions have similar definitions to events, but use the 'function'
# keyword. You may also specify a return type after the function parameters
# with a colon.
#
# Syntax: function name(arg1: type1, arg2: type2): return_type
function is_internal(cid: conn_id): bool
    {
    # Have a dummy local subnet
    local internal_net: subnet = 10.0.0.0/8;

    # These two if statements could be combined, for demonstration they
    # are separate.
    if ( cid$orig_h in internal_net )
        return T;

    if ( cid$resp_h in internal_net )
        return T;

    # Neither case matched, so return false.
    return F;
    }

event new_connection(c: connection)
    {
    # Pass c$id, not the connection, so that it is a conn_id
    if ( ! is_internal(c$id) )
        print fmt("Connection between %s and %s is not internal!",
            c$id$orig_h, c$id$resp_h);
    }

You may also use a function to modify container values. In this example, we modify host within a separate function:

basics/functions_pass_by_reference.zeek

type ServerHost: record {
    ip: addr;
    scanned_count: count;
    last_seen: time;
};

# Updates the scan count, as well as updating when it was last seen.
function update_scan_count(host: ServerHost)
    {
    host$scanned_count += 1;
    host$last_seen = network_time();
    }

event zeek_init()
    {
    local host: ServerHost = ServerHost(
        $ip = 192.168.1.10,
        $scanned_count = 0,
        $last_seen = network_time(),
    );

    print fmt("Before: Count=%d", host$scanned_count); # prints "Before: Count=0"

    # Pass the record into update_scan_count.
    # Zeek passes records by *reference*, so the function modifies the original.
    update_scan_count(host);
    update_scan_count(host);

    # Notice that the count was changed!
    print fmt("After: Count=%d", host$scanned_count); # prints "After: Count=2"
    }

The most important part here is that only certain types in Zeek are “pass by reference”. You pass aggregate types like a table or record into a function by reference, so the function may modify their values. But, if you pass a count, then the function will modify a copy, not the original. Try modifying the above script to pass host$scanned_count in by value and see that it doesn’t get updated.

For more information, see function.

Async Functions

Note

Asynchronous functions are a relatively advanced concept, but important to understand the detection script from the beginning.

Some functions may take some time to complete, so Zeek should not wait for it to complete before continuing with its execution. Zeek provides a when keyword in order to wait for that result, then make it available when it’s ready. In this example, we use when in order to lookup the DNS TXT record for www.zeek.org:

basics/functions_async.zeek

event zeek_init()
    {
    # Prints will probably happen in numerical order
    print "1. Requesting DNS...";

    # 'when' handles waiting for lookup_hostname_txt to finish.
    # The code inside curly braces executes when it is complete.
    when ( local result = lookup_hostname_txt("www.zeek.org") )
        {
        print fmt("3. Found DNS result: %s", result);
        }
    # You could optionally add a "timeout" here, too.

    # This code will execute immediately, it will not wait for the result
    # from lookup_hostname_txt
    print "2. Request sent, moving on";
    }

You don’t have to understand the specifics here. If a function is “asynchronous” then you must use when in order to wait for its result without blocking Zeek’s execution. If you removed the when in the previous example, Zeek will error:

error in ./functions_async.zeek, line 8: lookup_hostname_txt() can only be called inside a when-condition (lookup_hostname_txt(www.zeek.org))

For more information, see when.

Understanding a Real Script

We now have the tools to understand the detect-mhr script in its entirety. At the beginning, we only showed the file_hash event handler. The logic for the event was mostly within do_mhr_lookup, which is a function call. Here is that function in its entirety, then we will go through the entire script and explain each part:

basics/detect-MHR.zeek

function do_mhr_lookup(hash: string, fi: Notice::FileInfo)
    {
    local hash_domain = fmt("%s.malware.hash.cymru.com", hash);

    when [hash, fi, hash_domain] ( local MHR_result = lookup_hostname_txt(hash_domain) )
        {
        # Data is returned as "<dateFirstDetected> <detectionRate>"
        local MHR_answer = split_string1(MHR_result, / /);

        if ( |MHR_answer| == 2 )
            {
            local mhr_detect_rate = to_count(MHR_answer[1]);

            if ( mhr_detect_rate >= notice_threshold )
                {
                local mhr_first_detected = double_to_time(to_double(MHR_answer[0]));
                local readable_first_detected = strftime("%Y-%m-%d %H:%M:%S", mhr_first_detected);
                local message = fmt("Malware Hash Registry Detection rate: %d%%  Last seen: %s", mhr_detect_rate, readable_first_detected);
                local virustotal_url = fmt(match_sub_url, hash);
                # We don't have the full fa_file record here in order to
                # avoid the "when" statement cloning it (expensive!).
                local n: Notice::Info = Notice::Info($note=Match, $msg=message, $sub=virustotal_url);
                Notice::populate_file_info2(fi, n);
                NOTICE(n);
                }
            }
        }
    }

First, the function itself takes the hash (provided from file_hash) and a Notice::FileInfo:

basics/detect-MHR.zeek

function do_mhr_lookup(hash: string, fi: Notice::FileInfo)

Then, we declare a local variable that holds the URL we look up:

basics/detect-MHR.zeek

    local hash_domain = fmt("%s.malware.hash.cymru.com", hash);

This variable is used in the when statement to look it up asynchronously:

basics/detect-MHR.zeek

    when [hash, fi, hash_domain] ( local MHR_result = lookup_hostname_txt(hash_domain) )

The when statement has an extra section here, within square brackets []. That specifies that the block can use hash, fi, and hash_domain from the outer do_mhr_lookup function. Without it, we could not later use hash in the when block. Because the code inside {} runs later (potentially after the function has finished), Zeek needs to copy these variables so that they are alive within the when block.

Next, we use the result within the when block in order to check the data:

basics/detect-MHR.zeek

        # Data is returned as "<dateFirstDetected> <detectionRate>"
        local MHR_answer = split_string1(MHR_result, / /);

        if ( |MHR_answer| == 2 )
            {
            local mhr_detect_rate = to_count(MHR_answer[1]);

The data in MHR_answer is just the result from the DNS lookup, split at a space. It is a vector, from split_string1. If the answer has two elements, that means the split was successful, so we can move on with the logic.

We then convert the string in MHR_answer[1] (the second element of the vector of string) into a count and put it into mhr_detect_rate.

Now, we check if that detection rate is high enough to trigger a notice:

basics/detect-MHR.zeek

            if ( mhr_detect_rate >= notice_threshold )

This is using some notice_threshold declared as an option in the export block above, so users may configure its value.

If it’s above the threshold, we have decided to trigger a notice. This uses Zeek’s notice framework, but most of the concepts should be pretty familiar. In this instance, we are mostly just manipulating the string to make the notice human-readable:

basics/detect-MHR.zeek

                local mhr_first_detected = double_to_time(to_double(MHR_answer[0]));
                local readable_first_detected = strftime("%Y-%m-%d %H:%M:%S", mhr_first_detected);
                local message = fmt("Malware Hash Registry Detection rate: %d%%  Last seen: %s", mhr_detect_rate, readable_first_detected);
                local virustotal_url = fmt(match_sub_url, hash);
                # We don't have the full fa_file record here in order to
                # avoid the "when" statement cloning it (expensive!).
                local n: Notice::Info = Notice::Info($note=Match, $msg=message, $sub=virustotal_url);
                Notice::populate_file_info2(fi, n);
                NOTICE(n);

You can read more about Zeek’s notice framework in the Notice Framework section. Note that NOTICE is just a function.

With that, we went from zero to understanding a full Zeek script. In the next section, we will build up a script from scratch, using what we learned here.

The Basics

Why Script?

Zeek Events

Data Types

Network Types

Time Types

Container Types

Record Types

Standard Types

Visibility and Scope

Local and Global

Exporting

Redefinitions

Using &redef

Functions

Async Functions

Understanding a Real Script

Using `&redef`