Getting Started

Spicy’s own Getting Started guide uses the following Spicy code to parse a simple HTTP request line:

my-http.spicy
module MyHTTP;

const Token      = /[^ \t\r\n]+/;
const WhiteSpace = /[ \t]+/;
const NewLine    = /\r?\n/;

type Version = unit {
    :       /HTTP\//;
    number: /[0-9]+\.[0-9]+/;
};

public type RequestLine = unit {
    method:  Token;
    :        WhiteSpace;
    uri:     Token;
    :        WhiteSpace;
    version: Version;
    :        NewLine;

    on %done {
        print self.method, self.uri, self.version.number;
        }
};

While the Spicy documentation goes on to show how to use this to parse corresponding data from the command line, here we will instead leverage the RequestLine parser to build a proof-of-concept protocol analyzer for Zeek. While this all remains simplified here, the following, more in-depth Tutorial demonstrates how to build a complete analyzer for a real protocol.

Preparations

Because Zeek works from network packets, we first need a packet trace with the payload we want to parse. We can’t just use a normal HTTP session as our simple parser wouldn’t go further than just the first line of the protocol exchange and then bail out with an error. So instead, for our example we create a custom packet trace with a TCP connection that carries just a single HTTP request line as its payload:

# tcpdump -i lo0 -w request-line.pcap port 12345 &
# nc -l 12345 &
# echo "GET /index.html HTTP/1.0" | nc localhost 12345
# killall tcpdump nc

This gets us this trace file.

Adding a Protocol Analyzer

Now we can go ahead and add a new protocol analyzer to Zeek. We already got the Spicy grammar to parse our connection’s payload, it’s in my-http.spicy. In order to use this with Zeek, we have two additional things to do: (1) We need to let Zeek know about our new protocol analyzer, including when to use it; and (2) we need to define at least one Zeek event that we want our parser to generate, so that we can then write a Zeek script working with the information that it extracts.

We do both of these by creating an additional control file for Zeek:

my-http.evt
1protocol analyzer spicy::MyHTTP over TCP:
2    parse originator with MyHTTP::RequestLine,
3    port 12345/tcp;
4
5on MyHTTP::RequestLine -> event MyHTTP::request_line($conn, self.method, self.uri, self.version.number);

The first block (lines 1-3) tells Zeek that we have a new protocol analyzer to provide. The analyzer’s Zeek-side name is spicy::MyHTTP, and it’s meant to run on top of TCP connections (line 1). Lines 2-3 then provide Zeek with more specifics: The entry point for originator-side payload is the MyHTTP::RequestLine unit type that our Spicy grammar defines (line 2); and we want Zeek to activate our analyzer for all connections with a responder port of 12345 (which, of course, matches the packet trace we created).

The second block (line 5) tells Zeek that we want to define one event. On the left-hand side of that line we give the unit that is to trigger the event. The right-hand side defines its name and arguments. What we are saying here is that every time a RequestLine line has been fully parsed, we’d like a MyHTTP::request_line event to go to Zeek. Each event instance will come with four parameters: Three of them are the values of corresponding unit fields, accessed just through normal Spicy expressions (inside an event argument expression, self refers to the unit instance that has led to the generation of the current event). The first parameter, $conn, is a “magic” keyword that passes the Zeek-side connection ID (conn_id) to the event.

Now we got everything in place that we need for our new protocol analyzer—except for a Zeek script actually doing something with the information we are parsing. Let’s use this:

my-http.zeek
event MyHTTP::request_line(c: connection, method: string, uri: string, version: string)
	{
	print fmt("Zeek saw from %s: %s %s %s", c$id$orig_h, method, uri, version);
	}

You see an Zeek event handler for the event that we just defined, having the expected signature of four parameters matching the types of the parameter expressions that the *.evt file specifies. The handler’s body then just prints out what it gets.

Finally we can put together our pieces by compiling the Spicy grammar and the EVT file into an HLTO file with spicyz, and by pointing Zeek at the produced file and the analyzer-specific Zeek scripts:

# spicyz my-http.spicy my-http.evt -o my-http.hlto
# zeek -Cr request-line.pcap my-http.hlto my-http.zeek
Zeek saw from 127.0.0.1: GET /index.html 1.0

When Zeek starts up here the Spicy integration registers a protocol analyzer to the entry point of our Spicy grammar as specified in the EVT file. It then begins processing the packet trace as usual, now activating our new analyzer whenever it sees a TCP connection on port 12345. Accordingly, the MyHTTP::request_line event gets generated once the parser gets to process the session’s payload. The Zeek event handler then executes and prints the output we would expect.

Note

By default, Zeek suppresses any output from Spicy-side print statements. You can add Spicy::enable_print=T to the command line to see it. In the example above, you would then get an additional line of output: GET, /index.html, 1.0.