6. Toolchain

6.1. spicy-build

spicy-build is a shell frontend that compiles Spicy source code into a standalone executable by running spicyc to generate the necessary C++ code, then spawning the system compiler to compile and link that.

spicy-build [options] <input files>

    -d          Build a debug version.
    -g          Disable HILTI-side optimizations of the generated code.
    -o <file>   Destination name for the compiled executable; default is "a.out".
    -t          Do not delete tmp files (useful for inspecting, and use with debugger)
    -v          Verbose output, display command lines executing.
    -S          Do not compile the "spicy-driver" host application into executable.

Input files may be anything that spicyc can compile to C++.

6.2. spicy-config

spicy-config reports information about Spicy’s build & installation options.

Usage: spicy-config [options]

Available options:

    --bindir                 Prints the path to the directory where binaries are installed.
    --build                  Prints "debug" or "release", depending on the build configuration.
    --cmake-path             Prints the path to Spicy-provided CMake modules
    --cxx                    Print the path to the C++ compiler used to build Spicy
    --cxx-launcher           Print the full path to the compiler launcher used to compile HILTI.
    --cxxflags               Print flags for C++ compiler when compiling generated code statically
    --cxxflags-hlto          Print flags for C++ compiler when building precompiled HLTO libraries
    --debug                  Output flags for working with debugging versions.
    --distbase               Print path of the Spicy source distribution.
    --dynamic-loading        Adjust --ldflags for host applications that dynamically load precompiled modules
    --have-toolchain         Prints 'yes' if the Spicy toolchain was built, 'no' otherwise.
    --help                   Print this usage summary
    --include-dirs           Prints the Spicy runtime's C++ include directories
    --include-dirs-toolchain Prints the Spicy compiler's C++ include directories
    --ldflags                Print flags for linker when compiling generated code statically
    --ldflags-hlto           Print flags for linker linker when building precompiled HLTO libraries
    --libdirs                Print standard Spicy library directories.
    --libdirs-cxx-runtime    Print C++ library directories for runtime.
    --libdirs-cxx-toolchain  Print C++ library directories for toolchain.
    --prefix                 Print path of installation
    --spicy-build            Print the path to the spicy-build script.
    --spicyc                 Print the path to the spicyc binary.
    --version                Print the Spicy version as a string.
    --version-number         Print the Spicy version as a numerical value.

6.3. spicyc

spicyc compiles Spicy code into C++ output, optionally also executing it directly through JIT.

Usage: spicyc [options] <inputs>

Options controlling code generation:

  -c | --output-c++                 Print out C++ code generated for module (for debugging; use -x to generate code for external compilation).
  -d | --debug                      Include debug instrumentation into generated code.
  -e | --output-all-dependencies    Output list of dependencies for all compiled modules.
  -g | --disable-optimizations      Disable HILTI-side optimizations of the generated code.
  -j | --jit-code                   Fully compile all code, and then execute it unless --output-to gives a file to store it
  -l | --output-linker              Print out only generated HILTI linker glue code (for debugging; use -x to generate code for external compilation).
  -o | --output-to <path>           Path for saving output.
  -p | --output-hilti               Just output parsed HILTI code again.
  -v | --version                    Print version information.
  -x | --output-c++-files <prefix>  Output generated all C++ code into set of files for external compilation.
  -A | --abort-on-exceptions        When executing compiled code, abort() instead of throwing HILTI exceptions.
  -B | --show-backtraces            Include backtraces when reporting unhandled exceptions.
  -C | --dump-code                  Dump all generated code to disk for debugging.
  -D | --compiler-debug <streams>   Activate compile-time debugging output for given debug streams (comma-separated; 'help' for list).
  -E | --output-code-dependencies   Output list of dependencies for all compiled modules that require separate compilation of their own.
  -L | --library-path <path>        Add path to list of directories to search when importing modules.
  -P | --output-prototypes <prefix> Output C++ header with prototypes for public functionality.
  -R | --report-times               Report a break-down of compiler's execution time.
  -S | --skip-dependencies          Do not automatically compile dependencies during JIT.
  -T | --keep-tmps                  Do not delete any temporary files created.
  -V | --skip-validation            Don't validate ASTs (for debugging only).
  -X | --debug-addl <addl>          Implies -d and adds selected additional instrumentation (comma-separated; see 'help' for list).
  -Z | --enable-profiling           Report profiling statistics after execution.
       --cxx-link <lib>             Link specified static archive or shared library during JIT or to produced HLTO file. Can be given multiple times.
       --skip-standard-imports      Do not automatically import standard library modules (for debugging only).
       --strict-public-api          Skip optimizations that change the public C++ API of generated code.  [default in debug builds]
       --no-strict-public-api       Allow optimizations that change the public C++ API of generated code. [default in release builds]

  -Q | --include-offsets            Include stream offsets of parsed data in output.


Inputs can be .spicy, .hlt, .cc/.cxx, *.hlto.

spicyc also supports the following environment variables to control the compilation process:

SPICY_PATH Replaces the built-in search path for *.spicy source files.

SPICY_CACHE

Location for storing precompiled C++ headers. Default is ~/.cache/spicy/<VERSION>.

HILTI_CXX

Specifies the path to the C++ compiler to use.

HILTI_CXX_COMPILER_LAUNCHER

Specifies a command to prefix compiler invocations with during JIT. This can e.g., be used to use a compiler cache like ccache. If Spicy was configured with e.g., --with-hilti-compiler-launcher=ccache (the equivalent CMake option is HILTI_COMPILER_LAUNCHER) ccache would automatically be used during JIT. Setting this variable to an empty value disables use of ccache in that case.

HILTI_CXX_FLAGS

Specifies additional flags to pass during C++ compilation. This will be added after all implicit arguments. Use HILTI_CXX_INCLUDE_DIRS to specify additional include directories.

HILTI_CXX_INCLUDE_DIRS

Specifies additional, colon-separated C++ include directories to search for header files. Directories passed via HILTI_CXX_INCLUDE_DIRS will be searched for headers before any header search paths implicit in Spicy C++ compilation.

HILTI_DISABLE_OPTIMIZER_PASSES

Colon-separated list of optimizer passes to disable.

HILTI_JIT_PARALLELISM

Set to specify the maximum number of background compilation jobs to run during JIT. Defaults to number of cores.

HILTI_JIT_SEQUENTIAL

Set to prevent spawning multiple concurrent C++ compiler instances. This overrides any value set for HILTI_JIT_PARALLELISM and effectively sets it to one.

HILTI_PATH

Replaces the built-in search path for *.hlt source files.

HILTI_PRINT_SETTINGS

Set to see summary of compilation options.

HILTI_OPTIMIZER_OMIT_CFG_DATAFLOW

Set to 1 to omit dataflow facts from the control-flow graph debug streams.

HILTI_OPTIMIZER_STRICT_PUBLIC_API

Set to 1 to disallow any changes to the public C++ API of generated code by the optimizer. Set to 0 to allow such changes, which may, for example, remove storage for parsed fields that are never accessed by any Spicy code.

If not set, the default is 0 for release builds and 1 for debug builds. There are also command line options --(no-)strict-public-api that override this environment variable as well as any default.

6.4. spicy-driver

spicy-driver is a standalone Spicy host application that compiles and executes Spicy parsers on the fly, and then feeds them data for parsing from standard input.

Usage: cat <data> | spicy-driver [options] <inputs> ...

Options:

  -c | --require-accept               Return failure exit code if parser did not call accept_input(), or called decline_input().
  -d | --debug                        Include debug instrumentation into generated code.
  -g | --disable-optimizations        Disable HILTI-side optimizations of the generated code.
  -i | --increment <i>                Feed data incrementally in chunks of size n.
  -f | --file <path>                  Read input from <path> instead of stdin.
  -l | --list-parsers                 List available parsers and exit; use twice to include aliases.
  -p | --parser <name>                Use parser <name> to process input. Only needed if more than one parser is available.
  -v | --version                      Print version information.
  -A | --abort-on-exceptions          When executing compiled code, abort() instead of throwing HILTI exceptions.
  -B | --show-backtraces              Include backtraces when reporting unhandled exceptions.
  -D | --compiler-debug <streams>     Activate compile-time debugging output for given debug streams (comma-separated; 'help' for list).
  -F | --batch-file <path>            Read Spicy batch input from <path>; see docs for description of format.
  -L | --library-path <path>          Add path to list of directories to search when importing modules.
  -P | --parser-alias <alias>=<name>  Add alias name for parser of existing name.
  -R | --report-times                 Report a break-down of compiler's execution time.
  -S | --skip-dependencies            Do not automatically compile dependencies during JIT.
  -U | --report-resource-usage        Print summary of runtime resource usage.
  -V | --skip-validation              Don't validate ASTs (for debugging only).
  -X | --debug-addl <addl>            Implies -d and adds selected additional instrumentation (comma-separated; see 'help' for list).
  -Z | --enable-profiling             Report profiling statistics after execution.
       --strict-public-api            Skip optimizations that change the public C++ API of generated code.
       --strict-public-api            Skip optimizations that change the public C++ API of generated code.  [default in debug builds]
       --no-strict-public-api         Allow optimizations that change the public C++ API of generated code. [default in release builds]
code.

Environment variables:

  SPICY_PATH                      Colon-separated list of directories to search for modules. In contrast to --library-paths using this flag overwrites builtin paths.

Inputs can be .spicy, .hlt, .cc/.cxx, *.o, *.hlto.

spicy-driver supports the same environment variables as spicyc.

6.4.1. Specifying the parser to use

If there’s only single public unit in the Spicy source code, spicy-driver will automatically use that for parsing its input. If there’s more than one public unit, you need to tell spicy-driver which one to use through its --parser (or -p) option. To see the parsers that are available, use --list-parsers (or -l).

In addition to the names shown by --list-parsers, you can also specify a parser through a port or MIME type if the corresponding unit defines them through properties. For example, if a unit defines %port = 80/tcp, you can use spicy-driver -p 80/tcp to select it. To specify a direction, add either %orig or %resp (e.g., -p 80/tcp%resp); then only units with a port tagged with an &originator or &responder attribute, respectively, will be considered. If a unit defines %mime-type = application/test, you can select it through spicy-driver -p application/test.

New in version 1.13: Verbose mode for list-parsers

Internally, these port-based arguments for -p are alias names for existing parsers. You can see all aliases by running spicy-driver with -ll (i.e., --list-parsers twice).

6.4.2. Batch input

spicy-driver provides a batch input mode for processing multiple interleaved input flows in parallel, mimicking how host applications like Zeek would be employing Spicy parsers for processing many sessions concurrently. The batch input must be prepared in a specific format (see below) that provides embedded meta information about the contained flows of input. If you have Zeek at hand, the easiest way to generate such a batch is a script coming with Zeek. If you run Zeek with this script on a PCAP trace, it will record the contained TCP and UDP sessions into a Spicy batch file:

# zeek -b -r http/methods.trace policy/frameworks/spicy/record-spicy-batch
tracking [orig_h=128.2.6.136, orig_p=46562/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46563/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46564/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46565/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46566/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46567/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
[...]
tracking [orig_h=128.2.6.136, orig_p=46608/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46609/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46610/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
recorded 49 sessions total
output in batch.dat

You will now have a file batch.dat that you can use with spicy-driver -F batch.data ....

By default, the batch created by the Zeek script will select parsers for the contained sessions through well-known ports. That means your units need to have a %port property matching the responder port of the sessions you want them to parse. So for the HTTP trace above, our Spicy source code would need to provide a public unit with property %port = 80/tcp;.

New in version 1.13: --parser-alias

Alternatively, you can run spicy-driver with --parser-alias PORT=PARSER to tell it explicitly which parsers to use for connections on a particular port. Here, PORT must be of the form <port>/<protocol> (e.g., 80/tcp), and PARSER is the name of the parser to use (as shown by spicy-driver --list-parsers). By default, the parser will be applied to both directions of all connections that are using that responder port. You can limit the direction by appending either %orig or %resp to PORT (e.g., 80/tcp%orig to attach the parser only to originator-side flows). --parser-alias can be used multiple times to specify further mappings.

In case you want to create batches yourself, we document the batch format in the following. A batch needs to start with a line !spicy-batch v2<NL>, followed by lines with commands of the form @<tag> <arguments><NL>.

There are two types of input that the batch format can represent: (1) individual, uni-directional flows; and (2) bi-directional connections consisting in turn of one flow per side. The type is determined through an initial command: @begin-flow starts a flow flow, and @begin-conn starts a connection. Either form introduces a unique, free-form ID that subsequent commands will then refer to. The following commands are supported:

@begin-flow FID TYPE PARSER<NL>

Initializes a new input flow for parsing, associating the unique ID FID with it. TYPE must be either stream for stream-based parsing (think: TCP), or block for parsing each data block independent of others (think: UDP). PARSER is the name of the Spicy parser to use for parsing this input flow, given in the same form as with spicy-driver’s --parser option (i.e., either as a unit name, a %port, or a %mime-type).

@begin-conn CID TYPE ORIG_FID ORIG_PARSER RESP_FID RESP_PARSER<NL>

Initializes a new input connection for parsing, associating the unique connection ID CID with it. TYPE must be either stream for stream-based parsing (think: TCP), or block for parsing each data block independent of others (think: UDP). ORIG_FID is separate unique ID for the originator-side flow, and ORIG_PARSER is the name of the Spicy parser to use for parsing that flow. RESP_FID and RESP_PARSER work accordingly for the responder-side flow. The parsers can be given in the same form as with spicy-driver’s --parser option (i.e., either as a unit name, a %port, or a %mime-type).

@data FID SIZE<NL>

A block of data for the input flow FID. This command must be followed directly by binary data of length SIZE, plus a final newline character. The data represents the next chunk of input for the corresponding flow. @data can be used only inside corresponding @begin-* and @end-* commands bracketing the flow ID.

@gap FID SIZE<NL>

A gap of size SIZE. This inserts a gap into the input stream that will trigger a parse error once the parser reaches it. If the parser supports error recovery, it will then attempt to continue processing after the gap. @gap is similar to how a host application like Zeek would report TCP reassembly gaps caused by missing packets.

@end-flow FID<NL>

Finalizes parsing of the input flow associated with FID, releasing all state. This must come only after a corresponding @begin-flow command, and every @begin-flow must eventually be followed by an @end-flow.

@end-conn CID<NL>

Finalizes parsing the input connection associated with CID, releasing all state (including for its two flows). This must come only after a corresponding @begin-conn command, and every @begin-conn must eventually be followed by an @end-end.

6.5. spicy-dump

spicy-dump is a standalone Spicy host application that compiles and executes Spicy parsers on the fly, feeds them data for processing, and then at the end prints out the parsed information in either a readable, custom ASCII format, or as JSON (--json or -J). By default, spicy-dump disables showing the output of Spicy print statements, --enable-print or -P reenables that.

Usage: cat <data> | spicy-dump [options] <inputs> ...

Options:

  -d | --debug                    Include debug instrumentation into generated code.
  -f | --file <path>              Read input from <path> instead of stdin.
  -l | --list-parsers             List available parsers and exit; use twice to include aliases.
  -p | --parser <name>            Use parser <name> to process input. Only needed if more than one parser is available.
  -v | --version                  Print version information.
  -A | --abort-on-exceptions      When executing compiled code, abort() instead of throwing HILTI exceptions.
  -B | --show-backtraces          Include backtraces when reporting unhandled exceptions.
  -D | --compiler-debug <streams> Activate compile-time debugging output for given debug streams (comma-separated; 'help' for list).
  -L | --library-path <path>      Add path to list of directories to search when importing modules.
  -J | --json                     Print JSON output.
  -P | --enable-print             Show output of Spicy 'print' statements (default: off).
  -Q | --include-offsets          Include stream offsets of parsed data in output.
  -R | --report-times             Report a break-down of compiler's execution time.
  -S | --skip-dependencies        Do not automatically compile dependencies during JIT.
  -X | --debug-addl <addl>        Implies -d and adds selected additional instrumentation (comma-separated; see 'help' for list).
  -Z | --enable-profiling         Report profiling statistics after execution.
       --strict-public-api        Skip optimizations that change the public C++ API of generated code.
       --no-strict-public-api     Allow optimizations that change the public C++ API of generated code.

Environment variables:

  SPICY_PATH                      Colon-separated list of directories to search for modules. In contrast to --library-paths using this flag overwrites builtin paths.

Inputs can be .spicy, .hlt, *.spicy *.hlt *.hlto.