5.5. Guidelines
This section collects guidelines for writing Spicy parsers in the form of best practices, useful patterns, and pitfalls to avoid. The content is compiled from growing experience with real-world parsers by the broader Spicy community. If you have anything to add here, please let us know.
Note
For now this section focuses on Spicy performance. We plan to extend it further with common patterns and idioms for structuring parsers. Contributions welcome.
5.5.1. Spicy Performance
This section provides advice on how to optimize CPU and memory usage if you find your Spicy parser to consume more resources than you would like.
As general note upfront, keep in mind that it can be tricky to estimate the performance impact of some particular piece Spicy code. Some seemingly simple Spicy constructs can turn into substantial amounts of C++ code that may be expensive both at runtime and to compile. On the other hand, some of Spicy’s most powerful features, like look-ahead parsing or failure recovery, add only relatively little additional complexity to the code. On top of that, overall performance often depends on how features are combined, or layered across several units.
Hence, it’s useful to remember the following four rules on Spicy performance:
Do as much work as needed in your Spicy analyzer, but not more. For example, what you can do in Zeek, do there.
If you’re in doubt about the performance impact of some Spicy code, do a benchmark.
Rule 2: If you think you know the performance impact of some Spicy code, do a benchmark!
In practice, it might all not matter that much anyways!
To explain Rule 4: When you are writing a parser for a network protocol in Spicy, the resulting performance impact is a direct result of the amount of traffic that parser will be processing. For example, in a typical Zeek setup, where you network carries a mix of various protocols, your parser will see only a subset of the overall traffic. And if you aren’t going for the handful of most common high-volume protocols (e.g., HTTP, DNS, TLS), then your parser will very likely end up processing only a tiny fraction of the overall traffic. At that point, its runtime performance is going to be dwarfed by everything else Zeek is doing. Hence, take the following with a grain of salt. Usually it’s best to get you parser working first, then benchmark to see if you need to improve performance, and finally find the primary bottlenecks of needed.
5.5.1.1. Runtime Performance
- Among other things, runtime performance is affected by:
additional C++ code that is generated by the Spicy compiler
use of expensive Spicy constructs
poorly managed objects hogging memory (both on stack and heap)
Below are some strategies to make the runtime performance as efficient as possible.
Avoid declaring public units
The Spicy compiler generates additional code for public
units,
thereby increasing both compile and execution times. Defining units as
public
is only required for top-level units where parsing starts
(i.e., with Zeek: the units that you enter into your EVT file). The
attribute can be omitted otherwise.
Turn unreferenced vectors into anonymous fields
Implement vectors that don’t need to be referenced from your code as anonymous fields (i.e., do not provide a name for the field). Normal, named fields store the whole vector accumulating all parsed elements over the lifetime of the defining unit, whereas anonymous vector field forgo storage, since nobody could access the fields anyways.
The most common pitfall here is top-level units that parse a sequence of PDUs:
public type MyPDUs = unit {
msgs: Message[]; # DANGEROUS: accumulates all messages until end of session
: Message[]; # RECOMMENDED: anonymous field, no accumulation
};
Skip unused fields
Don’t parse data that is not needed. Parsing data into a field will
cause a copy of that data into a dedicated memory location. Instead,
use the skip keyword to discard data that isn’t of
interest, which leads the compiler to generate optimized code for many
field types, including in particular bytes
, literals, and
generally fields of fixed size. Examples:
public type Message = unit {
unused1: uint32; # LESS EFFICIENT: parse and store
unused2: skip uint32; # RECOMMENDED: skip over 4 bytes
unused3: bytes &eod; # LESS EFFICIENT: extract and store remaining data
unused4: skip bytes &eod; # RECOMMENDED: skip over remaining data
};
Avoid Spicy strings and string manipulation
Use Spicy strings sparsely in your analyzer, and
stick to bytes instead where you can. Typically,
you would convert bytes
to strings as late as possible just when
you need it, for example when passing it to functions expecting a
string, or preparing it for presentation to the user. When passing
data to Zeek event in your EVT files, bytes
will be automatically
converted to Zeek strings, retaining their original byte-level
representation. That means that you don’t need to convert them into a
string yourself at all unless you want to take character encodings
into account for the conversion through decode()
.
As a corollary, avoid Spicy string manipulation. Always
manipulate/concatenate bytes
and convert only the final result to
Spicy strings, probably in a %done
hook or an EVT event. In
particular, format strings come with a cost to compute. In other
words, avoid use of %s
to generate strings from bytes
.
Don’t use temporary variables of expensive types just for readability
Don’t use temporary variables just to improve readability. In
particular, string
and bytes
(that are implemented as C++
strings under the hood and can be expensive to use) need to be
allocated and destroyed. This may introduce relevant overhead as it
cannot be guaranteed that the C++ compiler will be able to optimize away the temporary in
the code generated by Spicy. To improve readability, comments are the
tool of choice. If in doubt about the impact of a temporary, benchmark.
Remove unnecessary hooks
Multiple hook handlers with different priorities can be defined in
various places like inside a unit or, with Zeek, in EVT files.
However, hooks (either unit or field) should be avoided when not
needed. Using hooks comes with a performance cost because of
additional code generated by the compiler, which executes during
parsing. While %init
hooks are often used for initializing unit
variables, we can often eliminate them by providing default values in
the variable definition:
public type Message = unit {
on %init {
self.A1 = 23; # LESS EFFICIENT: explicit initialization through hook
}
var A1: uint32;
var A2: uint32 = 23; # RECOMMENDED: implicit initialization through default value
};
Avoid recursion
The Spicy compiler allows declaration of recursive units with runtime conditions dictating when the recursion terminates. However, recursion introduces additional overhead compared to unrolled linear/iterative code performing similar functionality due to increasing the lifetimes of units, data and their associated hooks; as well as less potential for compiler optimization of additional internal machinery around the recursive calls.
Inline small nested units
The Spicy compiler is not smart enough yet to inline nested units. Since declaration of units incurs additional cost to maintain their associated state and hooks, it is advisable to manually inline small units where performance is critical.
Minimize event generation
With Zeek, minimize the number of generated events. Each instance of an event comes with overhead as the parsed data needs to be converted from Spicy’s data model into Zeek’s data model, which can involve heap allocations even for simple types. It’s the number of event instances generated at runtime here that matters, not the number of event types defined in the EVT files (although the latter may increase compilation times).
Aggregate data to be forwarded into other analyzers
When passing chunks of data back into Zeek through
zeek::protocol_data_in
, it can be more efficient to aggregate
multiple chunks inside a temporary variable first, instead of
forwarding each chunk individually. This is because each chunk
forwarded to Zeek will go through its analyzer pipeline individually,
which incurs additional overhead.
Move fixed local values into global constants
Inside functions and hooks, local variables are created and destroyed every time the corresponding code executes. For non-trivial types, that can lead to noticeable overhead. If the locals aren’t modified, consider moving them to global constants instead. (For some particular expensive, non-mutable types, Spicy performs this optimization internally already; for example, for regular expressions.)
Avoid small sized byte fields
Avoid using the bytes
type for fields that could be handled using
integer types. As bytes will always allocate a C++ string under the
hood, using integer types can improve performance.
Consider outsourcing into C++
Consider outsourcing complex and performance-critical calculations required for your parsing into custom C++ code. In particular, decoding bytes into special string representations or peculiar time conversions might be significantly faster when implemented in C++ directly. See Custom Extensions for more on how to do that.
State-management in analyzers
Try to avoid using global variables, such as maps, to store analyzer
state as that can cause significant memory bloat over longer periods
if not managed correctly. Instead, prefer to retain analyzer state
through a %context inside your top-level units,
and then propagate that context down through unit arguments for other
code to populate it. When used with Zeek, Spicy ties the context state
to individual connections that get teared down automatically when the
connection state is removed, thereby preventing accidental state space
explosion. Note, however, that even state maintained inside a
%context
will need additional manual management if it can grow
unbounded for long-running connections (like state tables that
continuously accumulate new information with each PDU).
5.5.1.2. Compilation Performance
Depending on the complexity of the Spicy code, it may take a bit (and sometimes quite a while) to compile your parsers. In the following, we collect some recommendations to speed up the compile process.
Note
When processing Spicy code, generally the bulk of the time
tends to be spent on compiling the generated C++ code; often about
80-90%. If you want to see a break-down of where Spicy spends its
time, run the Spicy compiler with --report-times
. In the
output at the end, jit
refers to compiling generated C++ code.
Precompile Headers
Make sure to run spicy-precompile-headers to speed up C++ compilation a little.
Faster Debug Builds
During development of new parsers, it helps quite a bit to build
non-optimized debug versions by adding --debug
to the Spicy
compiler’s command-line. This emits almost identical code, but then
compiles the generated code without -O2
(i.e., not optimized),
which avoids some work the C++ compiler would otherwise do. The
produced HLTO will perform (much) less well so it is probably not
useful for production.
Danger
Do not run spicyc
with --disable-optimizations
as that
will actually generate more C++ code to compile.
When building a Spicy parser as a Zeek analyzer with the default package
template one can pass Spicy compilation flags via the SPICYZ_FLAGS
CMake
variable, e.g., to build a parser in debug mode configure the parser with
$ cmake -DSPICYZ_FLAGS="--debug" <OTHER FLAGS>
For building with zkg
you can add this flag to the CMake invocation
zkg.meta
’s build_command
; this change is for development and likely
should not be published.
Use a compiler cache to speed up repeated compilations
C++ compilation can become the dominant factor in compilation time for Spicy parsers. If you repeatedly compile the same file (this might even be an unchanged module in your Spicy project) it is worthwhile to cache the C++ compilation results to avoid doing this work again.
To configure a compiler cache set the its invocation in the environment
variable HILTI_CXX_COMPILER_LAUNCHER
, e.g., to use an installed ccache:
$ export HILTI_CXX_COMPILER_LAUNCHER=ccache
Tweak compilation parallelism
When compiling generated C++ code by default Spicy will spawn as many parallel compiler processes as there are cores. This often works well enough, but can produce issues when e.g., (1) C++ compilation requires a lot of RAM so concurrent processes might compete for it and end up swapping, or (2) if multiple parsers are built in parallel as part of a bigger build setup. If this is something you observe it might make sense to reduce the level of parallelism, e.g.,
# Run at most 4 parallel C++ compilation jobs.
$ export HILTI_JIT_PARALLELISM=4
Especially for case (1) it might make sense to check whether you can switch to a more efficient compiler.
Consider switching to a more efficient compiler
Compilation performance of GCC and Clang can differ by a lot, e.g., GCC can require 2-4GB of RAM to compile C++ files generated by Spicy while Clang might only require 1-2GB. This can negatively impact performance if RAM becomes a bottleneck and forces process memory into slower swap, see also Tweak compilation parallelism. For this reason it can be worthwhile to switch to Clang to speed up compilation, especially during development.
Spicy utilizes the same compiler for compiling generated C++ files that was
used for compiling Spicy itself. Binary packages are most often built with a
system compiler so going down this path requires a custom build of Spicy (or
Zeek if Spicy comes bundled with it). You can query the compiler Spicy would
use with spicy-config
, e.g.,
$ spicy-config --cxx
/usr/bin/c++
# '/usr/bin/c++' corresponds to gcc-12.2.0-14 on this system.
$ /usr/bin/c++ --version
c++ (Debian 12.2.0-14) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
To build Spicy with Clang instead configure its build with the following flags
$ ./configure --with-cxx-compiler=clang++ --with-c-compiler=clang --prefix=<MY CUSTOM PREFIX> <OTHER FLAGS>
To configure a Zeek build to use Clang set the CC
and CXX
environment
variables when making a clean build
# Environment variables only have an effect for a clean build.
$ rm -rf build
$ CXX=clang++ CC=clang ./configure --prefix=<MY CUSTOM PREFIX> <OTHER FLAGS>
After building and installing you should see a changed C++ compiler with
spicy-config --cxx
for your custom-built spicy-config
, e.g.,
$ spicy-config --cxx
clang++
Danger
While one can switch the compiler at runtime with the HILTI_CXX
environment variable it is not the right tool to switch between GCC and
Clang since the compilers can produce ABI-incompatible code. This will in
the best case lead to linker failures (worst case: parsers might behave
incorrectly at runtime).
Reduce number of imports required
By reducing the number of imports, i.e. source files, compilation from scratch will become faster. There is a tradeoff, as multiple files may allow for some incremental compilation if caching is used, and thus may speed up subsequent builds.