5.5. Guidelines

This section collects guidelines for writing Spicy parsers in the form of best practices, useful patterns, and pitfalls to avoid. The content is compiled from growing experience with real-world parsers by the broader Spicy community. If you have anything to add here, please let us know.

Note

For now this section focuses on Spicy performance. We plan to extend it further with common patterns and idioms for structuring parsers. Contributions welcome.

5.5.1. Spicy Performance

This section provides advice on how to optimize CPU and memory usage if you find your Spicy parser to consume more resources than you would like.

As general note upfront, keep in mind that it can be tricky to estimate the performance impact of some particular piece Spicy code. Some seemingly simple Spicy constructs can turn into substantial amounts of C++ code that may be expensive both at runtime and to compile. On the other hand, some of Spicy’s most powerful features, like look-ahead parsing or failure recovery, add only relatively little additional complexity to the code. On top of that, overall performance often depends on how features are combined, or layered across several units.

Hence, it’s useful to remember the following four rules on Spicy performance:

Do as much work as needed in your Spicy analyzer, but not more. For example, what you can do in Zeek, do there.

If you’re in doubt about the performance impact of some Spicy code, do a benchmark.

Rule 2: If you think you know the performance impact of some Spicy code, do a benchmark!

In practice, it might all not matter that much anyways!

To explain Rule 4: When you are writing a parser for a network protocol in Spicy, the resulting performance impact is a direct result of the amount of traffic that parser will be processing. For example, in a typical Zeek setup, where you network carries a mix of various protocols, your parser will see only a subset of the overall traffic. And if you aren’t going for the handful of most common high-volume protocols (e.g., HTTP, DNS, TLS), then your parser will very likely end up processing only a tiny fraction of the overall traffic. At that point, its runtime performance is going to be dwarfed by everything else Zeek is doing. Hence, take the following with a grain of salt. Usually it’s best to get you parser working first, then benchmark to see if you need to improve performance, and finally find the primary bottlenecks of needed.

5.5.1.1. Runtime Performance

Among other things, runtime performance is affected by:

additional C++ code that is generated by the Spicy compiler
use of expensive Spicy constructs
poorly managed objects hogging memory (both on stack and heap)

Below are some strategies to make the runtime performance as efficient as possible.

Avoid declaring public units

The Spicy compiler generates additional code for public units, thereby increasing both compile and execution times. Defining units as public is only required for top-level units where parsing starts (i.e., with Zeek: the units that you enter into your EVT file). The attribute can be omitted otherwise.

Turn unreferenced vectors into anonymous fields

Implement vectors that don’t need to be referenced from your code as anonymous fields (i.e., do not provide a name for the field). Normal, named fields store the whole vector accumulating all parsed elements over the lifetime of the defining unit, whereas anonymous vector field forgo storage, since nobody could access the fields anyways.

The most common pitfall here is top-level units that parse a sequence of PDUs:

public type MyPDUs = unit {
    msgs: Message[]; # DANGEROUS:   accumulates all messages until end of session
    : Message[];     # RECOMMENDED: anonymous field, no accumulation
};

Skip unused fields

Don’t parse data that is not needed. Parsing data into a field will cause a copy of that data into a dedicated memory location. Instead, use the skip keyword to discard data that isn’t of interest, which leads the compiler to generate optimized code for many field types, including in particular bytes, literals, and generally fields of fixed size. Examples:

public type Message = unit {
    unused1: uint32;            # LESS EFFICIENT: parse and store
    unused2: skip uint32;       # RECOMMENDED:    skip over 4 bytes

    unused3: bytes &eod;        # LESS EFFICIENT: extract and store remaining data
    unused4: skip bytes &eod;   # RECOMMENDED:    skip over remaining data
};

Avoid Spicy strings and string manipulation

Use Spicy strings sparsely in your analyzer, and stick to bytes instead where you can. Typically, you would convert bytes to strings as late as possible just when you need it, for example when passing it to functions expecting a string, or preparing it for presentation to the user. When passing data to Zeek event in your EVT files, bytes will be automatically converted to Zeek strings, retaining their original byte-level representation. That means that you don’t need to convert them into a string yourself at all unless you want to take character encodings into account for the conversion through decode().

As a corollary, avoid Spicy string manipulation. Always manipulate/concatenate bytes and convert only the final result to Spicy strings, probably in a %done hook or an EVT event. In particular, format strings come with a cost to compute. In other words, avoid use of %s to generate strings from bytes.

Don’t use temporary variables of expensive types just for readability

Don’t use temporary variables just to improve readability. In particular, string and bytes (that are implemented as C++ strings under the hood and can be expensive to use) need to be allocated and destroyed. This may introduce relevant overhead as it cannot be guaranteed that the C++ compiler will be able to optimize away the temporary in the code generated by Spicy. To improve readability, comments are the tool of choice. If in doubt about the impact of a temporary, benchmark.

Remove unnecessary hooks

Multiple hook handlers with different priorities can be defined in various places like inside a unit or, with Zeek, in EVT files. However, hooks (either unit or field) should be avoided when not needed. Using hooks comes with a performance cost because of additional code generated by the compiler, which executes during parsing. While %init hooks are often used for initializing unit variables, we can often eliminate them by providing default values in the variable definition:

public type Message = unit {

    on %init {
        self.A1 = 23;     # LESS EFFICIENT: explicit initialization through hook
    }

    var A1: uint32;
    var A2: uint32 = 23;  # RECOMMENDED: implicit initialization through default value
};

Avoid recursion

The Spicy compiler allows declaration of recursive units with runtime conditions dictating when the recursion terminates. However, recursion introduces additional overhead compared to unrolled linear/iterative code performing similar functionality due to increasing the lifetimes of units, data and their associated hooks; as well as less potential for compiler optimization of additional internal machinery around the recursive calls.

Inline small nested units

The Spicy compiler is not smart enough yet to inline nested units. Since declaration of units incurs additional cost to maintain their associated state and hooks, it is advisable to manually inline small units where performance is critical.

Minimize event generation

With Zeek, minimize the number of generated events. Each instance of an event comes with overhead as the parsed data needs to be converted from Spicy’s data model into Zeek’s data model, which can involve heap allocations even for simple types. It’s the number of event instances generated at runtime here that matters, not the number of event types defined in the EVT files (although the latter may increase compilation times).

Aggregate data to be forwarded into other analyzers

When passing chunks of data back into Zeek through zeek::protocol_data_in, it can be more efficient to aggregate multiple chunks inside a temporary variable first, instead of forwarding each chunk individually. This is because each chunk forwarded to Zeek will go through its analyzer pipeline individually, which incurs additional overhead.

Move fixed local values into global constants

Inside functions and hooks, local variables are created and destroyed every time the corresponding code executes. For non-trivial types, that can lead to noticeable overhead. If the locals aren’t modified, consider moving them to global constants instead. (For some particular expensive, non-mutable types, Spicy performs this optimization internally already; for example, for regular expressions.)

Avoid small sized byte fields

Avoid using the bytes type for fields that could be handled using integer types. As bytes will always allocate a C++ string under the hood, using integer types can improve performance.

Consider outsourcing into C++

Consider outsourcing complex and performance-critical calculations required for your parsing into custom C++ code. In particular, decoding bytes into special string representations or peculiar time conversions might be significantly faster when implemented in C++ directly. See Custom Extensions for more on how to do that.

State-management in analyzers

Try to avoid using global variables, such as maps, to store analyzer state as that can cause significant memory bloat over longer periods if not managed correctly. Instead, prefer to retain analyzer state through a %context inside your top-level units, and then propagate that context down through unit arguments for other code to populate it. When used with Zeek, Spicy ties the context state to individual connections that get teared down automatically when the connection state is removed, thereby preventing accidental state space explosion. Note, however, that even state maintained inside a %context will need additional manual management if it can grow unbounded for long-running connections (like state tables that continuously accumulate new information with each PDU).

5.5.1.2. Compilation Performance

Depending on the complexity of the Spicy code, it may take a bit (and sometimes quite a while) to compile your parsers. In the following, we collect some recommendations to speed up the compile process.

Note

When processing Spicy code, generally the bulk of the time tends to be spent on compiling the generated C++ code; often about 80-90%. If you want to see a break-down of where Spicy spends its time, run the Spicy compiler with --report-times. In the output at the end, jit refers to compiling generated C++ code.

Precompile Headers

Make sure to run spicy-precompile-headers to speed up C++ compilation a little.

Faster Debug Builds

During development of new parsers, it helps quite a bit to build non-optimized debug versions by adding --debug to the Spicy compiler’s command-line. This emits almost identical code, but then compiles the generated code without -O2 (i.e., not optimized), which avoids some work the C++ compiler would otherwise do. The produced HLTO will perform (much) less well so it is probably not useful for production.

Danger

Do not run spicyc with --disable-optimizations as that will actually generate more C++ code to compile.

When building a Spicy parser as a Zeek analyzer with the default package template one can pass Spicy compilation flags via the SPICYZ_FLAGS CMake variable, e.g., to build a parser in debug mode configure the parser with

$ cmake -DSPICYZ_FLAGS="--debug" <OTHER FLAGS>

For building with zkg you can add this flag to the CMake invocation zkg.meta’s build_command; this change is for development and likely should not be published.

Use a compiler cache to speed up repeated compilations

C++ compilation can become the dominant factor in compilation time for Spicy parsers. If you repeatedly compile the same file (this might even be an unchanged module in your Spicy project) it is worthwhile to cache the C++ compilation results to avoid doing this work again.

To configure a compiler cache set the its invocation in the environment variable HILTI_CXX_COMPILER_LAUNCHER, e.g., to use an installed ccache:

$ export HILTI_CXX_COMPILER_LAUNCHER=ccache

Tweak compilation parallelism

When compiling generated C++ code by default Spicy will spawn as many parallel compiler processes as there are cores. This often works well enough, but can produce issues when e.g., (1) C++ compilation requires a lot of RAM so concurrent processes might compete for it and end up swapping, or (2) if multiple parsers are built in parallel as part of a bigger build setup. If this is something you observe it might make sense to reduce the level of parallelism, e.g.,

# Run at most 4 parallel C++ compilation jobs.
$ export HILTI_JIT_PARALLELISM=4

Especially for case (1) it might make sense to check whether you can switch to a more efficient compiler.

Consider switching to a more efficient compiler

Compilation performance of GCC and Clang can differ by a lot, e.g., GCC can require 2-4GB of RAM to compile C++ files generated by Spicy while Clang might only require 1-2GB. This can negatively impact performance if RAM becomes a bottleneck and forces process memory into slower swap, see also Tweak compilation parallelism. For this reason it can be worthwhile to switch to Clang to speed up compilation, especially during development.

Spicy utilizes the same compiler for compiling generated C++ files that was used for compiling Spicy itself. Binary packages are most often built with a system compiler so going down this path requires a custom build of Spicy (or Zeek if Spicy comes bundled with it). You can query the compiler Spicy would use with spicy-config, e.g.,

$ spicy-config --cxx
/usr/bin/c++

# '/usr/bin/c++' corresponds to gcc-12.2.0-14 on this system.
$ /usr/bin/c++ --version
c++ (Debian 12.2.0-14) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

To build Spicy with Clang instead configure its build with the following flags

$ ./configure --with-cxx-compiler=clang++ --with-c-compiler=clang --prefix=<MY CUSTOM PREFIX> <OTHER FLAGS>

To configure a Zeek build to use Clang set the CC and CXX environment variables when making a clean build

# Environment variables only have an effect for a clean build.
$ rm -rf build

$ CXX=clang++ CC=clang ./configure --prefix=<MY CUSTOM PREFIX> <OTHER FLAGS>

After building and installing you should see a changed C++ compiler with spicy-config --cxx for your custom-built spicy-config, e.g.,

$ spicy-config --cxx
clang++

Danger

While one can switch the compiler at runtime with the HILTI_CXX environment variable it is not the right tool to switch between GCC and Clang since the compilers can produce ABI-incompatible code. This will in the best case lead to linker failures (worst case: parsers might behave incorrectly at runtime).

Reduce number of imports required

By reducing the number of imports, i.e. source files, compilation from scratch will become faster. There is a tradeoff, as multiple files may allow for some incremental compilation if caching is used, and thus may speed up subsequent builds.