9. Release Notes
This following summarizes the most important changes in recent Spicy releases. For an exhaustive list of all changes, see the #CHANGES file coming with the distribution.
9.1. Version 1.15
New Functionality
Control-flow-based optimizations enabled and expanded
We built out the control-flow-based optimization first introduced as an experimental feature in spicy-1.14, and with this release these optimizer passes are enabled by default. For that we made the framework more efficient and cleaned up issues in the implementation.
We also introduced a new pass performing constant propagation (GH-2137, GH-2150) and reducing nested scopes to simplify data flow analysis (GH-1425).
GH-2197: Support coercion from empty list to struct
Spicy
structfields can already be declared with a&defaultvalue to reduce data duplication. We now allow constructingstructvalues from empty lists[]which effectively allows default-construction.
Changed Functionality
GH-2183: Use dedicated C++ types for representing
optionalandtuplein the runtime libraryGH-2194: Reimplement
hilti::rt::currentExecutablewithout external dependencyGH-2201:
HILTI_OPTIMIZER_PASSEShas been removedGH-2204: Minimum required GCC version bumped to gcc-12
GH-2222: Properly resolve and validate capture groups
Previously, incorrect uses of regular expression capture groups (e.g.,
$1,$2) were emitted to C++ without any analysis, causing compilation failures or unexpected behavior. Now, we validate capture groups and reject invalid Spicy code that was previously accepted.GH-2245: Extend our notion of reserved C++ identifiers
When generating C++ code from Spicy sources, we transform Spicy identifiers which are not valid in C++. We extended the set of identifiers we transform, which might be visible when directly working with the generated C++ code, like e.g., in custom host applications.
Bug fixes
GH-2144: Function parameter and local variable name clash leads to C++ error
GH-2154: Coercion from Null gets unhandled internal error
GH-2162: Fix ASAN false positive on ARM
GH-2175: Fix tuple assignment when coercing individual elements
GH-2177: Fix C++ code generation for struct parameters passed as references
GH-2184: FunctionParamVisitor fails to remove some redundant parameters
GH-2205: Uglify internal identifiers
Multiple fixes for issues reported by static analysis tools (
clang-tidy, Coverity)zeek/zeek#5008: Do not normalize ID names inside type information
Documentation
GH-2218: Add FAQ for code optimizers can remove
9.2. Version 1.14 (in progress)
New Functionality
GH-2028: New interprocedural optimizations.
We added infrastructure for performing interprocedural optimizations, and as a first user added a pass which removes unused function parameters in GH-2030. While this works on any code it is mainly intended to simply generated parser code for better runtime performance.
GH-1697: Remove some dead statements based on control and data flow.
We now collect control and data flow information. We use this to detect and remove “dead statements”, i.e., statements which are not seen by any other needed computations. Currently we handle two classes of dead statements:
assignments which are override before being used
unreachable code, e.g., due to preceding
return,breakorthrow
The implementation for this is still not able to cover all possible Spicy language constructs, so it is behind a feature flag and not enabled by default. To enable it one needs to set the environment variable
HILTI_OPTIMIZER_ENABLE_CFG=1when compiling Spicy code with e.g.,spicyc.We encourage users to test this compilation mode and if possible use the compiled parsers in production. If parsers compiled this way show the intended runtime behavior in tests they should also be fine to use in production.
Changed Functionality
GH-2050: Prefer stdout over stderr for
--helpmessages.Spicy tools now emit
--helpoutput to stdout instead ofstderr.GH-2068: Allow disabling building of tests.
We added a new CMake option
SPICY_ENABLE_TESTSwhich if toggled on forces building of test and benchmark binaries; it isONby default. This flag can be used by projects building Spicy to disable building of tests if they are not interested in them. We also provide a configure flag--disable-testswhich has has the effect of turning it off.GH-1663: Speed up checking of iterator compatibility.
We were previously using a control block which held a
weak_ptrto the protected data. This was pretty inefficient for a number of reasons:access to the controlled data always required a
weak_ptr::lockwhich created a temporaryshared_ptrcopy and immediately destroyed it after accessto check whether the control block was expired we used
lockinstead ofexpiredwhich introduced the same overheadto check compatibility of iterators we compared
shared_ptrsto the control data which again required full locks instead of usingowner_before
This manifested in e.g., loops often being less performant than possible. We now changed how we hold data to make iterating collections cheaper.
GH-2086: Fix scope resolution of local variables.
If usage of a local comes before its declaration, we now no longer resolve that usage to this local. It’ll either be resolved to an upper layer ID (if there is one of the same name), or rejected if it’s otherwise unknown.
GH-2066: When C++ compilation fails, ask user for help.
We do expect C++ code generated by Spicy to be valid, so C++ compiler errors in generated code are likely bugs. We now record the output of the C++ compiler in a dedicated file
hilti-jit-error.logand ask users to file a ticket in case C++ compilation failed.GH-1660: When printing anonymous bitfields inside a struct, lift up the fields.
This now prints, e.g.,
[$fin=1, $rsv=0, $opcode=2, $remaining=255]instead of[$<anon>=(1, 0, 2, 255)].In addition, we also prettify non-anonymous bitfields. They now print as, e.g.,
[$y=(a: 4, b: 8)]instead of[$y=(4, 8)].GH-1085: Allow registering a module twice.
So far, if one compiled the same HILTI module twice, each into its own HLTO, then when loading the two HLTOs, the runtime system would skip the second instance. However, that’s not really what we want: a module could intentionally be part of multiple HLTOs, in which case each should get its own copy of that module state (i.e., its globals).
This change allows the same module to be registered multiple times, with the HLTO linker scope distinguishing between the instances at runtime, as usual. To make that work, we move computation of the scope from compile time to runtime, using the library’s absolute path as the scope.
GH-1905: Fix operator precedence in Spicy grammar.
We fixed the precedence of a number of operators to be closer to what users would expect from other language like C++ or Python.
we reduced the precedence of the
inoperatorpre- and postfix operators
++and--now have same precedence and are right associativeunary negate was change to match the precedence of other unary operators.
Switch compilation to C++20.
Like Zeek Spicy now requires a C++ compiler. As part of this change we cleaned up the implementation to take advantage of C++ functionality in a number of places. We also moved from the external libraries
linb::anytostd::any, andghc::filesystemtostd::filesystem.Update supported platforms.
We dropped support for the following platforms:
debian-11
fedora-40
We added support for
debian-13
fedora-42
GH-1660: Render all bitfield instances with included field names.
GH-2099: Fully implement iterator interface for
set::Iterator.GH-2052: Move calling convention from function to function type.
Bug fixes
GH-2057: Fix
bytesiterator dereference operation.GH-2065: Error for redefined locals from statement inits.
GH-2061: Fix cyclic usage of units types inside other types.
GH-2074: Fix fiber abortion.
GH-2063: Fix C++ compilation issue with weak->strong refs.
GH-2064: Ensure generated typeinfos are declared before used.
GH-2044: Catch if methods are implemented multiple times.
GH-2078: Fix C++ output for constants of constant type.
GH-1988: Enforce that block-local declarations must be variables.
GH-1996: Catch exceptions in
processInputgracefully.GH-2091: Fix strong->value reference coercion in calls.
GH-2100: Add missing deref operations for struct try-member/has-member operators.
GH-2119: Fix missing
inlinefunctions in enum prototypes.GH-2142, GH-2134: Complete information exposed for reflection in typeinfo.
GH-2135: Add
&cxx-any-as-ptrattribute.
Documentation
GH-1905: Document operator precedence.
9.3. Version 1.13
New Functionality
GH-1788: We now support decoding and encoding to UTF16, in particular the new
UTF16LEandUTF16BEcharsets for little and big endian encoding, respectively.GH-1961: We now support creating type values in Spicy code. The primary use case for this is to pass type information to host applications, and debugging.
A type value is typically created from either
typeinfo(TYPE)ortypeinfo(value), or coercion from an existing ID of a custom type likeglobal T: type = MyStruct);. The resulting value can be printed or stored in a variable of typetype, e.g.,global bool_t: type = typeinfo(bool);.GH-1971: Extend unit
switchbased on look-ahead to support blocks of items.In 1.12.0 we added support grouping related unit fields in blocks; there the primary use case were
ifblocks to group fields with identical dependencies. We now also support such blocks inside unitswitchconstructs with lookahead so one can write the following code:# Parses either `a` followed by another `a`, or `b`. type X = unit { switch { -> { : b"a"; : b"a"; } -> : b"b"; }; };
GH-1538: Implement compound statements (
{...}). This allows introducing local scopes, e.g., to group related code.GH-1946:
string’sencodemethod gained an optionalerrorsargument to influence error handling. The parameter defaults toDecodeErrorStrategy::REPLACEreproducing the previous implicit behavior.GH-2010:
bytesandstringgainedends_withmethodsGH-1965: Add support for case-insensitive matching to regular expressions.
By adding an
iflag to a regular expression pattern, it will now be matched case-insensitively (e.g./foobar/i).GH-1962: Add
spicy-dumpoption to enable profiling.
Changed Functionality
GH-1981, GH-1982, GH-1991: We now catch more user errors in defining function overloads. Previously these would likely (hopefully) have failed in C++ compilation down the line, but are now cleanly rejected.
GH-1977: We now reject function overloads which only differ in their return type.
GH-1991: We now reject function prototypes without
&cxxname.Since in Spicy global declarations can be in any order there is no need to introduce a function with a prototype if it is declared later. The only valid use case for function prototypes was if the function was implemented in C++ and bound to the Spicy name with
&cxxname.We have cleaned up our implementation for runtime type information, primarily intended for custom host applications.
type_info::Valueinstances obtained through runtime type introspection can now be rendered to a user-facing representation with a newto_stringmethod.The runtime representation was changed to correctly encode that tuple elements can remain unset. A Spicy-side tuple
tuple<T1, T2, T3>now gets turned intostd::tuple<std::optional<T1>, std::optional<T2>, std::optional<T3>>which captures the full semantics.We added type information for types previously not exposed, namely
Null,NothingandList. We also fixed the exposed type information forresult<void>.
GH-2011: We have optimized allocations for unit fields extracting vectors which should speed up extracting especially small and medium-size vectors.
GH-2035: We have dropped support for Ubuntu 20.04 (Focal Fossa) since it has reached end of standard support upstream.
GH-2026: Speed up matching of character classes in regexps
Bug fixes
GH-1580: Catch when functions aren’t called.
GH-1961: Fix generated C++ prototype header.
GH-1966: Reject anonymous units in variables and fields.
GH-1967: Fix inactive stack size check during module initialization.
GH-1968: Fix coercion of function call arguments.
GH-1976: Fix unit
&max-sizenot returning to proper loc.GH-2007: Fix using
&trywith&max-size, and potentially other cases.GH-2016: Fix
&sizeexpressions evaluating multiple times.GH-2038: Prevent escape of non-HILTI exception in lower-level driver functions.
GH-2047: Make sure
bytes::to[U]Intreturns runtime integers.GH-2049: Add
#include <cstdint>for fixed-width integers
Documentation
GH-1155: Document iteration over maps/set/vectors.
GH-1963: Document
assert-exception.GH-1964: Document use of
$$inside&{while,until,until-including}.GH-1973: Remove documentation of unsupported
&nosub.GH-1974: Add documentation on how to interpret stack traces involving fibers.
GH-1975: Fix possibly-incorrect custom host compile command
GH-2039: Touchup docs style section.
GH-1970, GH-2003: Fix minor typos in documentation.
9.4. Version 1.12
New Functionality
We now support
ifaround a block of unit items:type X = unit { x: uint8; if ( self.x == 1 ) { a1: bytes &size=2; a2: bytes &size=2; }; };
One can also add an
else-block:type X = unit { x: uint8; if ( self.x == 1 ) { a1: bytes &size=2; a2: bytes &size=2; } else { b1: bytes &size=2; b2: bytes &size=2; }; };
We now support attaching an
%errorhandler to an individual field:type Test = unit { a: b"A"; b: b"B" %error { print "field B %error", self; } c: b"C"; };
With input
AxC, that handler will trigger, whereas withABxit won’t. If the unit had a unit-wide%errorhandler as well, that one would trigger in both cases (i.e., forb, in addition to its field local handler).The handler can also be provided separately from the field:
on b %error { ... }
In that separate version, one can receive the error message as well by declaring a corresponding string parameter:
on b(msg: string) %error { ... }
This works externally, from outside the unit, as well:
on Test::b(msg: string) %error { ... }
GH-1856: We added support for specifying a dedicated error message for
requiresfailures.This now allows creating custom error messages when a
&requirecondition fails. Example:type Foo = unit { x: uint8 &requires=($$ == 1 : error"Deep trouble!'"); # or, shorter: y: uint8 &requires=($$ == 1 : "Deep trouble!'"); };
This is powered by a new condition test expression
COND : ERROR.We reworked C++ code generation so now many parsers should compile faster. This is accomplished by both improved dependency tracking when emitting C++ code for a module as well as by a couple of new peephole optimization passes which additionally reduced the emitted code.
Changed Functionality
Add
CMAKE_CXX_FLAGStoHILTI_CONFIG_RUNTIME_LD_FLAGS.Speed up compilation of many parsers by streamlining generated C++ code.
Add
starts_with,split,split1,loweranduppermethods tostring.GH-1874: Add new library function
spicy::bytes_to_mac.Optimize
spicy::bytes_to_hexstringandspicy::bytes_to_mac.Improve validation of attributes so incompatible or invalid attributes should be rejected more reliably.
Optimize parsing for
bytesof fixed size as well as literals.Add a couple of peephole optimizations to reduce emitted C++ code.
GH-1790: Provide proper error message when trying access an unknown unit field.
GH-1792: Prioritize error message reporting unknown field.
GH-1803: Fix namespacing of
hiltiIDs in Spicy-side diagnostic output.GH-1895: Do no longer escape backslashes when printing strings or bytes.
GH-1857: Support
&requiresfor individual vector items.GH-1859: Improve error message when a unit parameter is used as a field.
GH-1898: Disallow attributes on “type aliases”.
GH-1938: Deprecate
&countattribute.GH-1928: Deprecate
&anchorwith regular expression constructors.GH-1935: Allow defining parser alias names when running spicy-driver.
Bug fixes
GH-1815: Disallow expanding limited
View’s again withlimit.Fix
to_uint(ByteOrder)for empty byte ranges.Fix undefined shifts of 32bit integer in
toInt().GH-1817: Prevent null ptr dereference when looking on nodes without
Scope.Fix use of move’d from variable.
GH-1823: Don’t qualify magic linker symbols with C++ namespace.
Fix diagnostics seen when compiling with GCC.
GH-1852: Fix
skipwith units.GH-1832: Fail for vectors with bytes but no stop.
GH-1860: Fix parsing for vectors of literals.
GH-1847: Fix resynchronization issue with trimmed input.
GH-1844: Fix nested look-ahead parsing.
GH-1842: Fix when input redirection becomes visible.
GH-1846: Fix bug with captures groups.
GH-1875: Fix potential nullptr dereference when comparing streams.
GH-1867: Fix infinite loops with recursive types.
GH-1868: Associate source code locations with current fiber instead of current thread.
GH-1871: Fix
&max-sizeon unit containing aswitch.GH-1791: Fix usage of
&convertwith unit’s requiring parameters.GH-1858: Fix the literals parsers not following coercions.
GH-1893: Encompass child node’s location in parent.
GH-1919: Validate that sets are sortable.
GH-1918: Fix potential segfault with stream iterators.
GH-1856: Disallow dereferencing a
result<void>value.Fix issue with type inference for
resultconstructor.GH-1933: Fix
HILTI_CXX_FLAGSfor when multiple flags are passed.GH-1829: Catch integer shifts exceeding the width of the operand.
Documentation
Redo error handling docs
Document
continuestatements.GH-1063: Document arguments to
newoperator.Updates
<bytes>.to_int()/<bytes>.to_uint()documentation.GH-1914: Make
$$documentation more precise.Fix doc code snippet that won’t compile.
9.5. Version 1.11
New Functionality
GH-3779: Add
%sync_advancehook.This adds support for a new unit hook:
on %sync_advance(offset: uint64) { ... }
This hook is called regularly during error recovery when synchronization skips over data or gaps while searching for a valid synchronization point. It can be used to check in on the synchronization to, e.g., abort further processing if it just keeps failing.
offsetis the current position inside the input stream that synchronization just skipped to.By default, “called regularly” means that it’s called every 4KB of input skipped over while searching for a synchronization point. That value can be changed by setting a unit property
%sync-advance-block-size = <number of bytes>.As an additional minor tweak, this also changes the name of what used to be the
__gap__profiler to now be called__sync_advancebecause it’s profiling the time spent in skipping data, not just gaps.Add unit method
stream()to access current input stream, and stream methodstatistics()to retrieve input statistics.This returns a struct of the following type, reflecting the input seen so far:
type StreamStatistics = struct { num_data_bytes: uint64; ## number of data bytes processed num_data_chunks: uint64; ## number of data chunks processed, excluding empty chunks num_gap_bytes: uint64; ## number of gap bytes processed num_gap_chunks: uint64; ## number of gap chunks processed, excluding empty chunks };
GH-1750: Add
to_realmethod tobytes.This interprets the data as representing an ASCII-encoded floating point number and converts that into a
real. The data can be in either decimal or hexadecimal format. If it cannot be parsed as either, throws anInvalidValueexception.GH-1608: Add
get_optionalmethod to maps.This returns an
optionalvalue either containing the map’s element for the given key if that entry exists, or an unsetoptionalif it does not.GH-90/GH-1733: Add
resultandspicy::Errortypes to Spicy to facilitate error handling.
Changed Functionality
The Spicy compiler has become a bit more strict and is now rejecting some ill-defined code constructs that previous versions ended up letting through. Specifically, the following cases will need updating in existing code:
Identifiers from the (internal)
hilti::namespace are no longer accessible. Usually you can just scope them withspicy::instead.Previous versions did not always enforce constness as it should have. In particular, function parameters could end up being mutable even when they weren’t declared as
inout. Nowinoutis required for supporting any mutable operations on a parameter, so make sure to add it where needed.When using unit parameters, the type of any
inoutparameters now must be unit itself. To pass other types into a unit so that they can be modified by the unit, use reference instead ofinout. For example, usetype Foo = unit(s: sink&)instead oftype Foo = unit(inout: sink). See https://docs.zeek.org/projects/spicy/en/latest/programming/parsing.html#unit-parameters for more.
The Spicy compiler new uses a more streamlined storage and access scheme to represent source code. This speeds up work up util C++ source translation (e.g., faster time to first error message during development).
spicycoptions-cand-lno longer support compiling multiple Spicy source files to C++ code individually to then build them all together. This was a rarely used feature and actually already broken in some situations. Instead, usespicyc -xto produce the C++ code for all needed Spicy source files at once.-cand-lremain available for debugging purposes.The
spicycoption-Pnow requires a prefix argument that sets the C++ namespace, just like-x <prefix>does. This is so that the prototypes match the actual code generated by-x. To get the same identifiers as before, use an empty prefix (-P "").GH-1763: Restrict initialization of
constvalues to literals. This means that e.g.,constvalues cannot be initialized from otherconstvalues or function calls anymore.resultandnetworkare now keywords and cannot be used anymore as user-specified identifiers.GH-1661: Deprecate usage of
&convertwith&chunked.GH-1657: Reduce data copying when passing data to the driver.
GH-1501: Improve some error messages for runtime parse errors.
GH-1655: Reject joint usage of filters and look-ahead.
GH-1675: Extend runtime profiling to measure parser input volume.
GH-1624: Enable optimizations when running
spicy-build.
Bug fixes
GH-1759: Fix
if-condition withswitchparsing.Fix Spicy’s support for
networktype.GH-1598: Enforce that the argument
newis either a type or a ctor.GH-1742, GH-1760: Unroll constructors of big containers in generated code. We previously would generate code which would be expensive to compiler for some compilers. We now generate more friendly code.
GH-1745: Fix C++ initialization of global constants through global functions.
GH-1743: Use a checked cast for
map’sinoperator.GH-1664: Fix
&converttyping issue with bit ranges.GH-1724: Fix skipping in size-constrained units. We previously could skip too much data if
skipwas used in a unit with a global&size.Fix incremental skipping. We previously would incorrectly compute the amount of data to skip which could have potentially lead to the parser consuming more data than available.
GH-1586: Make skip productions behave like the production they are wrapping.
GH-1711: Fix forwarding of a reference unit parameter to a non-reference parameter.
GH-1599: Fix integer increment/decrement operators require mutable arguments.
GH-1493: Support/fix public type aliases to units.
Documentation
Add new section with guidelines and best practices. This focuses on performance for now, but may be extended with other areas alter. Much of the content was contributed by Corelight Labs.
Fix documented type mapping for integers.
Document generic operators.
9.6. Version 1.10
New Functionality
Changed Functionality
Numerous improvements to improve throughput of generated parsers.
For this release we have revisited the code typically generated for parsers and the runtime libraries they use with the goal of improving throughput of parsers at runtime. Coarsely summarized this work was centered around
reduction of allocations during parsing
reduction of data copies during parsing
use of dedicated, hand-check implementations for automatically generated code to avoid overhead from safety checks in the runtime libraries
With these changes we see throughput improvements of some parsers in the range of 20-30%. This work consisted of numerous incremental changes, see
CHANGESfor the full list of changes.GH-1667: Always advance input before attempting resynchronization.
When we enter resynchronization after hitting a parse error we previously would have left the input alone, even though we know it fails to parse. We then relied fully on resynchronization to advance the input.
With this patch we always forcibly advance the input to the next non-gap position. This has no effect for synchronization on literals, but allows it to happen earlier for regular expressions.
GH-1659: Lift requirement that
bytesforwarded from filter be mutable.GH-1489: Deprecate &bit-order on bit ranges.
This had no effect and allowing it may be confusing to users. Deprecate it with the idea of eventual removal.
Extend location printing to include single-line ranges.
For a location of, e.g., “line 1, column 5 to 10”, we now print
1:5-1:10, whereas we used to print it as only1:5, hence dropping information.GH-1500: Add
+=operator forstring.This allows appending to a
stringwithout having to allocate a new string. This might perform better most of the time.GH-1640: Implement skipping for any field with known size.
This patch adds
skipsupport for fields with&sizeattribute or of builtin type with known size. If a unit has a known size and it is specified in a&sizeattribute this also allows to skip over unit fields.
Bug fixes
GH-1605: Allow for unresolved types for set
inoperator.GH-1617: Fix handling of
%synchronize-*attributes for units in lists.We previously would not detect
%synchronize-ator%synchronize-fromattributes if the unit was not directly in a field, i.e., we mishandled the common case of synchronizing on a unit in a list.We now handle these attributes, regardless of how the unit appears.
GH-1585: Put closing of unit sinks behind feature guard.
This code gets emitted, regardless of whether a sink was actually connected or not. Put it behind a feature guard so it does not enable the feature on its own.
GH-1652: Fix filters consuming too much data.
We would previously assume that a filter would consume all available data. This only holds if the filter is attached to a top-level unit, but in general not if some sub-unit uses a filter. With this patch we explicitly compute how much data is consumed.
GH-1668: Fix incorrect data consumption for
&max-size.We would previously handle
&sizeand&max-sizealmost identical with the only difference that&max-sizesets up a slightly larger view to accommodate a sentinel. In particular, we also used identical code to set up the position where parsing should resume after such a field.This was incorrect as it is in general impossible to tell where parsing continues after a field with
&max-sizesince it does not signify a fixed view like&size. We now compute the next position for a&max-sizefield by inspecting the limited view to detect how much data was extracted.GH-1522: Drop overzealous validator.
A validator was intended to reject a pattern of incorrect parsing of vectors, but instead ending up rejecting all vector parsing if the vector elements itself produced vectors. We dropped this validation.
GH-1632: Fix regex processing using
{n,m}repeat syntax being off by oneGH-1648: Provide meaningful unit
__beginvalue when parsing starts.We previously would not provide
__beginwhen starting the initial parse. This meant that e.g.,offset()was not usable if nothing ever got parsed.We now provide a meaningful value.
Fix skipping of literal fields with condition.
GH-1645: Fix
&sizecheck.The current parsing offset could legitimately end up just beyond the
&sizeamount.GH-1634: Fix infinite loop in regular expression parsing.
Documentation
Update documentation of
offset().Fix docs namespace for symbols from
filtermodule.We previously would document these symbols to be in
spicyeven though they are infilter.Add bitfield examples.
9.7. Version 1.9
New Functionality
GH-1468: Allow to directly access members of anonymous bitfields.
We now automatically map fields of anonymous bitfields into their containing unit.
type Foo = unit { : bitfield(8) { x: 0..3; y: 4..7; }; on %done { print self.x, self.y; } };
GH-1467: Support bitfield constants in Spicy for parsing.
One can now define bitfield “constants” for parsing by providing integer expressions with fields:
type Foo = unit { x: bitfield(8) { a: 0..3 = 2; b: 4..7; c: 7 = 1; };
This will first parse the bitfield as usual and then enforce that the two bit ranges that are coming with expressions (i.e.,
aandc) indeed containing the expected values. If they don’t, that’s a parse error.We also support using such bitfield constants for look-ahead parsing:
type Foo = unit { x: uint8[]; y: bitfield(8) { a: 0..3 = 4; b: 4..7; }; };
This will parse uint8s until a value is discovered that has its bits set as defined by the bitfield constant.
(We use the term “constant” loosely here: only the bits with values are actually enforced to be constant, all others are parsed as usual.)
GH-1089, GH-1421: Make
offset()independent of random access functionality.We now store the value returned by
offset()directly in the unit instead of computing it on the fly when requested fromcur - begin. With thatoffset()can be used without enabling random access functionality on the unit.Add support for passing arbitrary C++ compiler flags.
This adds a magic environment variable
HILTI_CXX_FLAGSwhich if set specifies compiler flags which should be passed during C++ compilation after implicit flags. This could be used to e.g., set defines, or set low-level compiler flags.Even with this flag, for passing include directories one should still use
HILTI_CXX_INCLUDE_DIRSsince they are searched before any implicitly added paths.GH-1435: Add bitwise operators
&,|, and^for booleans.GH-1465: Support skipping explicit
%donein external hooks.Assuming
Foo::Xis a unit type, these two are now equivalent:on Foo::X::%done { } on Foo::X { }
Changed Functionality
GH-1567: Speed up runtime calls to start profilers.
GH-1565: Disable capturing backtraces with HILTI exceptions in non-debug builds.
GH-1343: Include condition in
&requiresfailure message.GH-1466: Reject uses of
selfin unit&sizeand&max-sizeattribute.Values in
selfare only available after parsing has started while&sizeand&max-sizeare consumed before that. This means that any use ofselfand its members in these contexts would only ever see unset members, so it should not be the intended use.GH-1485: Add validator rejecting unsupported multiple uses of attributes.
GH-1465: Produce better error message when hooks are used on a unit field.
GH-1503: Handle anonymous bitfields inside
switchstatements.We now map items of anonymous bitfields inside a
switchcases into the unit namespace, just like we already do for top-level fields. We also catch if two anonymous bitfields inside those cases carry the same name, which would make accesses ambiguous.So the following works now:
switch (self.n) { 0 -> : bitfield(8) { A: 0..7; }; * -> : bitfield(8) { B: 0..7; }; };
Whereas this does not work:
switch (self.n) { 0 -> : bitfield(8) { A: 0..7; }; * -> : bitfield(8) { A: 0..7; }; };
GH-1571: Remove trimming inside individual chunks.
Trimming a
Chunk(always from the left) causes a lot of internal work with only limited benefit since we manage visibility with astream::Viewon top of aChunkanyway.We now trimming only removes a
Chunkfrom aChain, but does not internally change individual theChunkanymore. This should benefit performance but might lead to slightly increased memory use, but callers usually have that data in memory anyway.Use
find_package(Python)with version.Zeek’s configure sets
Python_EXECUTABLEhas hint, but Spicy is usingfind_package(Python3)and would only usePython3_EXECUTABLEas hint. This results in Spicy finding a different (the default) Python executable when configuring Zeek with--with-python=/opt/custom/bin/python3.Switch Spicy over to use
find_package(Python)and add the minimum version so it knows to look forPython3.
Bug fixes
GH-1520: Fix handling of
spicy-dump --enable-print.Fix spicy-build to correctly infer library directory.
GH-1446: Initialize generated struct members in constructor body.
GH-1464: Add special handling for potential
advancefailure in trial mode.GH-1275: Add missing lowering of Spicy unit ctor to HILTI struct ctor.
Fix rendering in validation of
%byte-orderattribute.GH-1384: Fix stringification of
DecodeErrorStrategy.Fix handling of
--show-backtracesflag.GH-1032: Allow using using bitfields with type declarations.
GH-1484: Fix using of
&converton bitfields.GH-1508: Fix returned value for
<unit>.position().GH-1504: Use user-inaccessible chars for encoding
::in feature variables.GH-1550: Replace recursive deletion with explicit loop to avoid stack overflow.
GH-1549: Add feature guards to accesses of a unit’s
__position.
Documentation
Move Zeek-specific documentation into Zeek documentation.
Clarify error handling docs.
Mention unit switch statements in conditional parsing docs.
9.8. Version 1.8
New Functionality
Add new
skipkeyword to let unit items efficiently skip over uninteresting data.For cases where your parser just needs to skip over some data, without needing access to its content, Spicy provides a
skipkeyword to prefix corresponding fields with:module Test; public type Foo = unit { x: int8; : skip bytes &size=5; y: int8; on %done { print self; } };
skipworks for all kinds of fields but is particularly efficient withbytesfields, for which it will generate optimized code avoiding the overhead of storing any data.skipfields may have conditions and hooks attached, like any other fields. However, they do not support$$in expressions and hooks.For readability, a
skipfield may be named (e.g.,padding: skip bytes &size=3;), but even with a name, its value cannot be accessed.skipfields extend support forvoidwith attributes fields which are now deprecated.Add runtime profiling infrastructure.
This add an option
--enable-profilingto the HILTI and Spicy compilers. Use of the option does two things: (1) it sets a flag enabling inserting additional profiling instrumentation into generated C++ code, and (2) it enables using instrumentation for recording profiling information during execution of the compiled code, including dumping out a profiling report at the end. The profiling information collected includes time spent in HILTI functions as well as for parsing Spicy units and unit fields.
Changed Functionality
Optimizations for improved runtime performance.
This release contains a number of changes to improve the runtime performance of generated parsers. This includes tweaks for generating more performant code for parsers, low-level optimizations of types in to runtime support library as well as fine-tuning of parser execution at runtime.
Do not force locale on users of libhilti.
Avoid expensive checked iterator for internal
Bytesiteration.GH-1089: Allow to use
offset()without enabling full random-access support.GH-1394: Fix C++ normalization of generated enum values.
Disallow using
$$with anonymous containers.
Bug fixes
GH-1386: Prevent internal error when passed invalid context.
Fix potential use-after-move bug.
GH-1390: Initialize
Bytesinternal control block for all constructors.GH-1396: Fix regex performance regression introduced by constant folding.
GH-1399: Guard access to unit
_filtersmember with feature flag.GH-1421: Store numerical offset in units instead of iterator for position.
GH-1436: Make sure
Bytes::subonly throws HILTI exceptions.GH-1447: Do not forcibly make
strong_refinfunction parameters immutable.GH-1452: Allow resolving of unit parameters before
selfis fully resolved.Make sure Spicy runtime config is initialized after
spicy::rt::init.Adjustments for building with GCC-13.
Documentation
Document how to check whether an
optionalvalue is set.Preserve indention when extracting comments in doc generation.
Fix docs for long-form of
-xflag to spicyc.
9.9. Version 1.7
New Functionality
Support Zeek-style documentation strings in Spicy source code.
Provide ability for host applications to initiate runtime’s module-pre-init phase manually.
Add DPD-style
spicy::accept_input()andspicy::decline_input().Add driver option to output full set of generated C++ files.
GH-1123: Support arbitrary expression as argument to type constructors, such as
interval(...).
Changed Functionality
Search
HILTI_CXX_INCLUDE_DIRSpaths before default include paths.Search user module paths before system paths.
Streamline runtime exception hierarchy.
Fix bug in cast from
realtointerval.GH-1326: Generate proper runtime types for enums.
GH-1330: Reject uses of imported module IDs as expression.
Bug fixes
GH-1310: Fix ASAN false positive with GCC.
GH-1345: Improve runtime performance of stream iteration.
GH-1367: Use unique filename for all object files generated during JIT.
Remove potential race during JIT when using
HILTI_CXX_COMPILER_LAUNCHER.GH-1349: Fix incremental regexp matching for potentially empty results.
Documentation
9.10. Version 1.6
New Functionality
GH-1249: Allow combining
&eodwith&untilor&until-including.GH-1251: When decoding bytes into a string using a given character set, allow caller to control error handling.
All methods taking a charset parameters now take an additional enum selecting 1 of 3 possible error handling strategies in case a character can’t be decoded/represented:
STRICTthrows an error,IGNOREskips the problematic character and proceeds with the next, andREPLACEreplaces the problematic character with a safe substitute.REPLACEis the default everywhere now, so that by default no errors are triggered.This comes with an additional functional change for the ASCII encoding: we now consistently sanitize characters that ASCII can’t represent when in
REPLACE/IGNOREmodes (and, hence, by default), and trigger errors inSTRICTmode. Previously, we’d sometimes let them through, and never triggered any errors. This also fixes a bug with the ASCII encoding sometimes turning a non-printable character into multiple repeated substitutes.GH-1294: Add library function to parse an address from string or bytes.
HLTO files now perform a version check when loaded.
We previously would potentially allow building a HLTO file against one version of the Spicy runtime, and then load it with a different version. If exposed symbols matched loading might have succeeded, but could still have lead to sublte bugs at runtime.
We now embed a runtime version string in HLTO files and reject loading HLTO files into a different runtime version. We require an exact version match.
New
packandunpackoperators.These provide low-level primitives for transforming a value into, or out of, a binary representations, see the docs for details.
Changed Functionality
GH-1236: Add support for adding link dependencies via
--cxx-link.GH-1285: C++ identifiers referenced in
&cxxnameare now automatically interpreted to be in the global namespace.Synchronization-related debug messages are now logged to the
spicy-verbosestream. We added logging of successful synchronization.Downgrade required Flex version. We previously required at least flex-2.6.0; we can now build against flex-2.5.37.
Improve C++ caching during JIT.
We improved caching behavior via
HILTI_CXX_COMPILER_LAUNCHERif the configuration ofspicycwas changed without changing the C++ file produced during JIT.hilti::rt::isDebugVersionhas been removed.The
-O | --optimizeflag has been removed from command line tools.This was already a no-op without observable side-effects.
GH-1311: Reject use of
context()unit method if unit does not declare a context with%context.GH-1319: Unsupported unit variable attributes are now rejected.
GH-1299: Add validator for bitfield field ranges.
We now reject uses of
selfas an ID.GH-1233: Reject key types for maps that can’t be sorted.
Fix validator for field
&defaultexpression types for constness.When checking types of field
&defaultexpressions we previously would also consider their constness. This breaks e.g., cases where the used expression is not a LHS like the field the&defaultis defined for,type X = unit { var x: bytes = b"" + a; };
We now do not consider constness in the type check anymore. Since fields are never const this allows us to set a
&defaultwith constant expressions as well.
Bug fixes
GH-1231: Add special handling for potential
advancefailure in trial mode.GH-1115, GH-1196: Explicitly type temporary value used by
&max_sizelogic.GH-1143, GH-1220: Add coercion on assignment for optionals that only differ in constness of their inner types.
GH-1230: Add coercion to default argument of
map::get.GH-1234, GH-1238: Fix assertions with anonymous struct constructor.
GH-1248: Fix
stopfor unbounded loop.GH-1250: Fix internal errors when seeing unsupported character classes in regular expression.
GH-1170: Fix contexts not allowing being passed
inout.GH-1266: Fix wrong type for Spicy-side
selfexpression.GH-1261: Fix inability to access unit fields through
selfin&convertexpressions.GH-1267: Install only needed headers from bundled SafeInt library.
GH-1227: Fix code generation when a module’s file could be imported through different means.
GH-1273: Remove bundled code licensed under CPOL license.
GH-1303: Fix potentially late synchronization when jumping over gaps during synchronization.
Do not force gold linker with user-provided linker flags or when built as a CMake subproject.
Improve efficiency of
startsWithfor long inputs.
Documentation
The documentation now reflects Zeek package manager Spicy feature templates.
The documentation for bitfields was clarified.
Documentation for casts from integers to boolean was added.
We added documentation for how to expose custom C++ code in Spicy.
Update doc link to commits mailing list.
Clarify that
%contextcan only be used in top-level units.Clarify that
&untilconsumes the delimiter.GH-1240: Clarify docs on
SPICY_VERSION.Add FAQ item on source locations.
Add example for use of
?..
9.11. Version 1.5
New Functionality
GH-1179: Cap parallelism use for JIT background jobs.
During JIT, we would previously launch all compilation jobs in parallel. For projects using many modules this could have lead to resource contention which often forced users to use sequential compilation with
HILTI_JIT_SEQUENTIAL. We now by default cap the number of parallel background jobs at the number of logical cores. This can be parameterized with the environment variableHILTI_JIT_PARALLELISMwhich forHILTI_JIT_PARALLELISM=1reproducesHILTI_JIT_SEQUENTIAL.GH-1134: Add support for
synchronize-atandsynchronize-afterproperties.These unit properties allow specifying a literal which should be searched for during error recovery. If the respective unit is used as a synchronize point during error recovery, i.e., it is used as a field which is marked
&synchronize, input resynchronization during error recovery will seek to the next position of this pattern in the input stream.GH-1209: Provide error message to
%errorhandler.We now allow to optionally provide a string parameter with
%errorthat will receive the associated error message:on %error(msg: string) { print msg; }
Changed Functionality
GH-1184: Allow more cache hits if only a few modules are changed in multi-module compilation.
GH-1208: Incremental performance tweaks for JIT.
GH-1197: Make handling of sanitizer workarounds more granular.
Bug fixes
GH-1150: Preserve additional permissions from umask when generating HLTO files.
GH-1154: Add stringificaton of
Map::value_type.GH-1080: Reject constant declarations at non-global scope.
GH-1164: Make compiler plugin initialization explicit.
GH-1050: Update location when entering most parser methods.
GH-1187: Fix support for having multiple source modules of the same name.
GH-1197: Prevent too early integer overflow in pow.
GH-1201: Adjust removal of symlinks on install for
DESTDIR.GH-1203: Allow changing
DESTDIRbetween configure and install time.GH-1204: Remove potential use-after-move.
GH-1210: Prevent unnecessarily executable stack with GNU toolchain.
GH-1206: Fix detection of recursive dependencies.
GH-1217: Produce
hilti::rt::Boolwhen casting to boolean.GH-1224: Fix import segfault.
Documentation
GH-44: Update docs for spicy-plugin rename
_Zeek::Spicy->Zeek::Spicy.GH-1183: Update docs for Discourse migration.
GH-1205: Update Spicy docs for now being built into Zeek.
9.12. Version 1.4
New Functionality
Add support for recovery from parse errors or incomplete input
This release adds support for recovering from parse errors or incomplete input (e.g., gaps or partial connections). Grammars can denote unit synchronization points with a
&synchronizeattribute. If an error is encountered while extracting a previous fields, parsing will attempt to resynchronize the input at that point. The synchronization result needs to be checked and confirmed or rejected explicitly; a number of hooks are provided for that. See the docs for details.Remove restriction that units used as sinks need to be
publicUses
ccachefor C++ compilation during JIT if Spicy itself was configured to useccache
Spicy spends a considerable amount of JIT time compiling generated C++ code. This work can be cached if neither inputs nor any of the used flags have changed so that subsequent JIT runs can complete much faster.
We now automatically cache many C++ compilation artifacts with
ccacheif Spicy itself was configured with e.g.,--with-hilti-compiler-launcher=ccache. This behavior can be controlled or disabled via theHILTI_CXX_COMPILER_LAUNCHERenvironment variable.
GH-842: Add Spicy support for struct initialization.
GH-1036: Support unit initialization through a struct constructor expression.
Changed Functionality
GH-1074:
%random-accessis now derived automatically from uses and declaring it explicitly has been deprecated.GH-1072: Disallow enum declarations with non-unique values.
It is unclear what code should be generated when requested to convert an integer value to the following enum:
type E = enum { A = 1, B = 2, C = 1, };
For
1we could produce eitherE::AorE::Chere.Instead of allowing this ambiguity we now disallow enums with non-unique values.
Bug fixes
Prevent exception if cache directory is not readable.
Propagate failure from
cmakeup to./configure.GH-1030: Make sure types required for globals are declared before being used.
Fix potentially use-after-free in stringification of
stream::View.GH-1087: Make
offsetreturn correct value even before parsing of field.
Documentation
9.13. Version 1.3
New Functionality
Add optimizer removing unused
%random-accessor%filterfunctionalityIf a unit has e.g., a
%random-accessattribute Spicy emits additional code to track and update offsets. If the%random-accessfunctionality is not used this leads to unneeded code being emitted which causes unneeded overhead, both during JIT and during execution.We now emit such feature-dependent code under a feature flag (effectively a global boolean constant) which is by default on. Additionally, we added an optimizer pass which detects whether a feature is used and can disable unused feature functionality (switching the feature flag to off), and can then remove unreachable code behind such disabled feature flags by performing basic constant folding.
Add optimizer pass removing unused sink functionality
By default any unit declared
publiccan be used as a sink. To support sink behavior additional code is emitted and invoked at runtime, regardless of whether the unit is used as a sink or not.We now detect unused sink functionality and avoid emitting it.
GH-934: Allow
$$in place ofselfin unit convert attributes.
Changed Functionality
GH-941: Allow use of units with all defaulted parameters as entry points.
We added precompilation support for
libspicy.h.Drop support for end-of-life Fedora 32, and add support for Fedora 34.
Bug fixes
Correctly handle lookups for NULL library symbols.
Use safe integers for
sizefunctions in the runtime library.Make it possible to build on ARM64.
Fix building with gcc-11.
Documentation
9.14. Version 1.2
New Functionality
GH-913: Add support for switch-level
&parse-atand&parse-fromattributes inside a unit.Add optimizer pass removing unimplemented functions and methods.
This introduces a global pass triggered after all individual input ASTs have been finalized, but before we generate any C++ code. We then strip out any unimplemented member functions (typically Spicy hooks), both their definitions as well as their uses.
In order to correctly handle previously generated C++ files which might have been generated with different optimization settings, we disallow optimizations if we detect that a C++ input file was generated by us.
Changed Functionality
Add validation of unit switch attributes. We previously silently ignored unsupported attributes; now errors are raised.
Remove configure option
--build-zeek-plugin. Spicy no longer supports building the Zeek plugin/analyzers in-tree. This used to be available primarily for development purposes, but became challenging to maintain.Add environment variable
HILTI_CXX_INCLUDE_DIRSto specify additional C++ include directories when compiling generated code.GH-940: Add runtime check for parsing progress during loops.
Bug fixes
Fix computation of unset locations.
Fix accidental truncating conversion in integer code.
Documentation
9.15. Version 1.1
New Functionality
GH-844: Add support for
&sizeattribute to unitswitchstatement.GH-26: Add
%skip,%skip-preand%skip-postproperties for skipping input matching a regular expression before any further input processing takes place.Extend library functionality provided by the
spicymodule:crc32_init()/crc32_add()compute CRC32 checksums.mktime()creates atimevalue from individual components.zlib_init()initializes aZlibStreamwith a given window bits argument.Zlibnow accepts a window bits parameter.
Add a new
find()method to units for that searches for abytessequence inside their input data, forward or backward from a given starting position.Add support for
&chunkedwhen parsing bytes data with&untilor&until_including.Add
encode()method tostringfor conversion tobytes.Extend parsing of
voidfields:Add support for
&eodto skip all data until the end of the current input is encountered.Add support for
&untilto skip all data until a deliminator is encountered. The deliminator will be extracted from the stream before continuing.
Port Spicy to Apple silicon.
Add Dockerfile for OpenSUSE 15.2.
Changed Functionality
Reject
voidfields with names.Lower minimum required Python version to 3.2.
GH-882: Lower minimum required Bison version to 3.0.
Bug fixes
GH-872: Fix missing normalization of enum label IDs.
GH-878: Fix casting integers to enums.
GH-889: Fix hook handling for anonymous void fields.
GH-901: Fix type resolution bug in
&convert.Fix handling of
&sizeattribute for anonymous void fields.Fix missing update to input position before running
%donehook.Add validation rejecting
$$in hooks not supporting it.Make sure container sizes are runtime integers.
Fix missing operator<< for enums when generating debug code.
GH-917: Default-initialize forwarding fields without type arguments.
GH-1774: Fix synchronization with symbol different from last lookahead token.
GH-1777: Fix interning of regexps for
%skip*.
Documentation
GH-37: Add documentation on how to skip data with
voidfields.
9.16. Migrating from the old prototype
Below we summarize language changes in Spicy compared to the original research prototype. Note that some of the prototype’s more advanced functionality has not yet been ported to the new code base; see the corresponding list on GitHub for what’s still missing.
Changes:
Renamed
exportlinkage topublic.Renamed
%byteorderproperty to%byte-order.Renamed
&byteorderattribute to&byte-order.Renamed
&bitorderattribute to&bit-order.All unit-level properties now need to conclude with a semicolon (e.g.,
%filter;).Renamed
&lengthattribute to&size.Renamed
&until_includingattribute to&until-including.Replaced
&parsewith separate&parse-from(taking a “bytes” instance) and&parse-at(taking a stream iterator) attributes.Attributes no longer accept their arguments in parentheses, it now must
<attr>=expr. (Before, both versions were accepted.)uint<N>andint<N>are no longer accepted, useuintN/intNinstead (which worked before already as well)list<T>is no longer supported, usevector<T>instead.New syntax for parsing sequences: Use
x: int8[5]instead ofx: vector<int8> &length=5. For lists of unknown size, usex: int8[]. When parsing sequences sub-units, use:x: Item[]; or, if further arguments/attributes are required,x: (Item(1,2,3))[]. (The latter syntax isn’t great, but the old syntax was ambiguous.)New syntax for functions:
function f(<params>) [: <result>]instead of<result> f(<params>)Renamed runtime support module from
Spicytospicy(so useimport spicy)In units, variables are now initialized to default values by default. Previously, that was (inconsistently) happening only for variables of type sink. To revert to the old behaviour, add “&optional” to the variable.
Renamed type
doubletoreal.Generally, types don’t coerce implicitly to bool anymore except in specific language contexts, such as in statements with boolean conditions.
Filters can now be implemented in Spicy itself. The pre-built
filter::Base64Decodeandfilter::Zlibprovide the base64 and zlib functionality of the previously built-in filters.{unit,sink}::add_filterare renamed to{unit,sink}::connect_filter.Enums don’t coerce to bool anymore, need to manually compare to
Undef.Coercion to bool now happens only in certain contexts, like
if-conditions (similar to C++).The sink method
sequencehas been renamed tosequence_number.The effect of the sink method
set_initial_sequence_numberno longer persists when reconnecting a different unit to a sink.&transientis no longer a supported unit field attribute. The same effect can now be achieved through an anonymous field (also see next point).$$can now be generally used in hooks to refer to the just parsed value. That’s particularly useful inside hooks for anonymous fields, including fields that previously were&transient(see above). Previously, “$$” worked only for container elements inforeachhooks (which still operates the same way).Fields of type
realare parsed with&typeattribute (e.g.,&type=Spicy::RealType::IEEE754_Double). They used to&precisionattributes with a different enum type.Assigning to unit fields and variables no longer triggers any hooks. That also means that hooks are generally no longer supported for variables (This is tricky to implement, not clear it’s worth the effort.)
When importing modules, module names are now case-sensitive.
When parsing vectors/lists of integers of a given length, use
&countinstead of&length.Zeek plugin:
Bro::dpd_confirm()has been renamed tozeek::confirm_protocol(). There’s also a correspondingzeek::reject_protocol().To auto-export enums to Zeek, they need to be declared public.