5.2.5. Types

5.2.5.1. Address

The address type stores both IPv4 and IPv6 addresses.

Type

  • addr

Constants

  • IPv4: 1.2.3.4
  • IPv6: [2001:db8:85a3:8d3:1319:8a2e:370:7348], [::1.2.3.4]

Methods

family()hilti::AddressFamily

Returns the protocol family of the address, which can be IPv4 or IPv6.

Operators

addr == addrbool

Compares two address values.

addr != addrbool

Compares two address values.

5.2.5.2. Bitfield

Bitfields provide access to individual bitranges inside an unsigned integer. That can’t be instantiated directly, but must be defined and parsed inside a unit.

Type

  • bitfield(N) { RANGE_1; ...; RANGE_N }
  • Each RANGE has one of the forms LABEL: A or LABEL: A..B where A and B are bit numbers.

Operators

bitfield . <attribute><field type>

Retrieves the value of a bitfield’s attribute. This is the value of the corresponding bits inside the underlying integer value, shifted to the very right.

5.2.5.3. Bool

Boolean values can be True or False.

Type

  • bool

Constants

  • True, False

Operators

bool == boolbool

Compares two boolean values.

bool != boolbool

Compares two boolean values.

5.2.5.4. Bytes

Bytes instances store raw, opaque data. They provide iterators to traverse their content.

Types

  • bytes
  • iterator<bytes>

Constants

  • b"Spicy", b""

Methods

at(i: uint<64>)iterator<bytes>

Returns an iterator representing the offset i inside the bytes value.

decode(charset: enum = hilti::Charset::UTF8)string

Interprets the bytes as representing an binary string encoded with the given character set, and converts it into a UTF8 string.

find(needle: bytes)tuple<bool, iterator<bytes>>

Searches needle in the value’s content. Returns a tuple of a boolean and an iterator. If needle was found, the boolean will be true and the iterator will point to its first occurrence. If needle was not found, the boolean will be false and the iterator will point to the last position so that everything before it is guaranteed to not contain even a partial match of needle. Note that for a simple yes/no result, you should use the in operator instead of this method, as it’s more efficient.

join(inout parts: vector)bytes

Returns the concatenation of all elements in the parts list rendered as printable strings. The portions will be separated by the bytes value to which this method is invoked as a member.

lower(charset: enum = hilti::Charset::UTF8)bytes

Returns a lower-case version of the bytes value, assuming it is encoded in character set charset.

match(regex: regexp, [ group: uint<64> ])result<bytes>

Matches the bytes object against the regular expression regex. Returns the matching part or, if group is given, then the corresponding subgroup. The expression is considered anchored to the beginning of the data.

split([ sep: bytes ])vector<bytes>

Splits the bytes value at each occurrence of sep and returns a vector containing the individual pieces, with all separators removed. If the separator is not found, the returned vector will have the whole bytes value as its single element. If the separator is not given, or empty, the split will take place at sequences of white spaces.

split1([ sep: bytes ])tuple<bytes, bytes>

Splits the bytes value at the first occurrence of sep and returns the two parts as a 2-tuple, with the separator removed. If the separator is not found, the returned tuple will have the whole bytes value as its first element and an empty value as its second element. If the separator is not given, or empty, the split will take place at the first sequence of white spaces.

starts_with(b: bytes)bool

Returns true if the bytes value starts with b.

strip([ side: spicy::Side ], [ set: bytes ])bytes

Removes leading and/or trailing sequences of all characters in set from the bytes value. If set is not given, removes all white spaces. If side is given, it indicates which side of the value should be stripped; Side::Both is the default if not given.

sub(begin: uint<64>, end: uint<64>)bytes

Returns the subsequence from offset begin to (but not including) offset end.

sub(inout begin: iterator<bytes>, inout end: iterator<bytes>)bytes

Returns the subsequence from begin to (but not including) end.

sub(inout end: iterator<bytes>)bytes

Returns the subsequence from the value’s beginning to (but not including) end.

to_int([ base: uint<64> ])int<64>

Interprets the data as representing an ASCII-encoded number and converts that into a signed integer, using a base of base. base must be between 2 and 36. If base is not given, the default is 10.

to_int(byte_order: enum)int<64>

Interprets the bytes as representing an binary number encoded with the given byte order, and converts it into signed integer.

to_time([ base: uint<64> ])time

Interprets the bytes as representing a number of seconds since the epoch in the form of an ASCII-encoded number, and converts it into a time value using a base of base. If base is not given, the default is 10.

to_time(byte_order: enum)time

Interprets the bytes as representing as number of seconds since the epoch in the form of an binary number encoded with the given byte order, and converts it into a time value.

to_uint([ base: uint<64> ])uint<64>

Interprets the data as representing an ASCII-encoded number and converts that into an unsigned integer, using a base of base. base must be between 2 and 36. If base is not given, the default is 10.

to_uint(byte_order: enum)uint<64>

Interprets the bytes as representing an binary number encoded with the given byte order, and converts it into an unsigned integer.

upper(charset: enum = hilti::Charset::UTF8)bytes

Returns an upper-case version of the bytes value, assuming it is encoded in character set charset.

Operators

begin(<container>)<iterator>

Returns an iterator to the beginning of the container’s content.

end(<container>)<iterator>

Returns an iterator to the end of the container’s content.

bytes == bytesbool

Compares two bytes values lexicographically.

bytes > bytesbool

Compares two bytes values lexicographically.

bytes >= bytesbool

Compares two bytes values lexicographically.

bytes in bytesbool

Returns true if the right-hand-side value contains the left-hand-side value as a subsequence.

bytes !in bytesbool

Performs the inverse of the corresponding in operation.

bytes < bytesbool

Compares two bytes values lexicographically.

bytes <= bytesbool

Compares two bytes values lexicographically.

|bytes|uint<64>

Returns the number of bytes the value contains.

bytes + bytesconst bytes

Returns the concatenation of two bytes values.

bytes += bytesbytes

Appends one bytes value to another.

bytes += uint<8>bytes

Appends a single byte to the data.

bytes += view<stream>bytes

Appends a view of stream data to a bytes instance.

bytes != bytesbool

Compares two bytes values lexicographically.

Iterator Operators

*iterator<bytes>uint<8>

Returns the character the iterator is pointing to.

iterator<bytes> - iterator<bytes>int<64>

Returns the number of bytes between the two iterators. The result will be negative if the second iterator points to a location before the first. The result is undefined if the iterators do not refer to the same bytes instance.

iterator<bytes> == iterator<bytes>bool

Compares the two positions. The result is undefined if they are not referring to the same bytes value.

iterator<bytes> > iterator<bytes>bool

Compares the two positions. The result is undefined if they are not referring to the same bytes value.

iterator<bytes> >= iterator<bytes>bool

Compares the two positions. The result is undefined if they are not referring to the same bytes value.

iterator<bytes>++iterator<bytes>

Advances the iterator by one byte, returning the previous position.

++iterator<bytes>iterator<bytes>

Advances the iterator by one byte, returning the new position.

iterator<bytes> < iterator<bytes>bool

Compares the two positions. The result is undefined if they are not referring to the same bytes value.

iterator<bytes> <= iterator<bytes>bool

Compares the two positions. The result is undefined if they are not referring to the same bytes value.

iterator<bytes> + uint<64>iterator<bytes> (commutative)

Returns an iterator which is pointing the given number of bytes beyond the one passed in.

iterator<bytes> += uint<64>iterator<bytes>

Advances the iterator by the given number of bytes.

iterator<bytes> != iterator<bytes>bool

Compares the two positions. The result is undefined if they are not referring to the same bytes value.

5.2.5.5. Enum

Enum types associate labels with numerical values.

Type

  • enum { LABEL_1; ... LABEL_N }
  • Each label has the form ID [= VALUE]. If VALUE is skipped, one will be assigned automatically.
  • Each enum type comes with an implicitly defined Undef label with a value distinct from all other ones. When coerced into a boolean, an enum will be true iff it’s not Undef.

Note

An instance of an enum can assume a numerical value that does not map to any of its defined labels. If printed, it will then render into <unknown-N> in that case, with N being the decimal expression of its numeric value.

Constants

  • The individual labels represent constants of the corresponding type (e.g., MyEnum::MyFirstLabel is a constant of type MyEnum).

Methods

has_label()bool

Returns true if the value of op1 corresponds to a known enum label (other than Undef), as defined by it’s type.

Operators

enum-type(int)enum

Instantiates an enum instance initialized from a signed integer value. The value does not need to correspond to any of the type’s enumerator labels.

enum-type(uint)enum

Instantiates an enum instance initialized from an unsigned integer value. The value does not need to correspond to any of the type’s enumerator labels. It must not be larger than the maximum that a signed 64-bit integer value can represent.

cast<int-type>(enum)int

Casts an enum value into a signed integer. If the enum value is Undef, this will return -1.

cast<uint-type>(enum)uint

Casts an enum value into a unsigned integer. This will throw an exception if the enum value is Undef.

enum == enumbool

Compares two enum values.

enum != enumbool

Compares two enum values.

5.2.5.6. Exception

Todo

This isn’t available in Spicy yet (#89).

5.2.5.7. Integer

Spicy distinguishes between signed and unsigned integers, and always requires specifying the bitwidth of a type.

Type

  • intN for signed integers, where N can be one of 8, 16, 32, 64.
  • uintN for signed integers, where N can be one of 8, 16, 32, 64.

Constants

  • Unsigned integer: 1234, +1234, uint8(42), uint16(42), uint32(42), uint64(42)
  • Signed integer: -1234, int8(42), int8(-42), int16(42), int32(42), int64(42)

Operators

uint & uintuint

Computes the bit-wise ‘and’ of the two integers.

uint | uintuint

Computes the bit-wise ‘or’ of the two integers.

uint ^ uintuint

Computes the bit-wise ‘xor’ of the two integers.

cast<enum-type>(int)enum

Converts the value into an enum instance. The value does not need to correspond to any of the target type’s enumerator labels.

cast<enum-type>(uint)enum

Converts the value into an enum instance. The value does not need to correspond to any of the target type’s enumerator labels. It must not be larger than the maximum that a signed 64-bit integer value can represent.

cast<int-type>(int)int

Converts the value into another signed integer type, accepting any loss of information.

cast<int-type>(uint)int

Converts the value into a signed integer type, accepting any loss of information.

cast<interval-type>(int)interval

Interprets the value as number of seconds.

cast<interval-type>(uint)interval

Interprets the value as number of seconds.

cast<real-type>(int)real

Converts the value into a real, accepting any loss of information.

cast<real-type>(uint)real

Converts the value into a real, accepting any loss of information.

cast<time-type>(uint)time

Interprets the value as number of seconds since the UNIX epoch.

cast<uint-type>(int)uint

Converts the value into an unsigned integer type, accepting any loss of information.

cast<uint-type>(uint)uint

Converts the value into another unsigned integer type, accepting any loss of information.

int--int

Decrements the value, returning the old value.

uint--uint

Decrements the value, returning the old value.

++intint

Increments the value, returning the new value.

++uintuint

Increments the value, returning the new value.

int - intint

Computes the difference between the two integers.

uint - uintuint

Computes the difference between the two integers.

int -= intint

Decrements the first value by the second, assigning the new value.

uint -= uintuint

Decrements the first value by the second.

int / intint

Divides the first integer by the second.

uint / uintuint

Divides the first integer by the second.

int /= intint

Divides the first value by the second, assigning the new value.

uint /= uintuint

Divides the first value by the second, assigning the new value.

int == intbool

Compares the two integers.

uint == uintbool

Compares the two integers.

int > intbool

Compares the two integers.

uint > uintbool

Compares the two integers.

int >= intbool

Compares the two integers.

uint >= uintbool

Compares the two integers.

int++int

Increments the value, returning the old value.

uint++uint

Increments the value, returning the old value.

++intint

Increments the value, returning the new value.

++uintuint

Increments the value, returning the new value.

int < intbool

Compares the two integers.

uint < uintbool

Compares the two integers.

int <= intbool

Compares the two integers.

uint <= uintbool

Compares the two integers.

int % intint

Computes the modulus of the first integer divided by the second.

uint % uintuint

Computes the modulus of the first integer divided by the second.

int * intint

Multiplies the first integer by the second.

uint * uintuint

Multiplies the first integer by the second.

int *= intint

Multiplies the first value by the second, assigning the new value.

uint *= uintuint

Multiplies the first value by the second, assigning the new value.

~uintuint

Computes the bit-wise negation of the integer.

int ** intint

Computes the first integer raised to the power of the second.

uint ** uintuint

Computes the first integer raised to the power of the second.

uint << uintuint

Shifts the integer to the left by the given number of bits.

uint >> uintuint

Shifts the integer to the right by the given number of bits.

-intint

Inverts the sign of the integer.

int + intint

Computes the sum of the integers.

uint + uintuint

Computes the sum of the integers.

int += intint

Increments the first integer by the second.

uint += uintuint

Increments the first integer by the second.

int != intbool

Compares the two integers.

uint != uintbool

Compares the two integers.

5.2.5.8. Interval

Am interval value represents a period of time. Intervals are stored with nanosecond resolution, which is retained across all calculations.

Type

  • interval

Constants

  • interval(SECS) creates an interval from a signed integer or real value SECS specifying the period in seconds.
  • interval_ns(NSECS) creates an interval from a signed integer value NSECS specifying the period in nanoseconds.

Methods

nanoseconds()uint<64>

Returns the time as an integer value representing nanoseconds since the UNIX epoch.

seconds()real

Returns the time as a real value representing seconds since the UNIX epoch.

Operators

time - timeinterval

Returns the difference of the times.

time - intervaltime

Subtracts the interval from the time.

time == timebool

Compares two time values.

time > timebool

Compares the times.

time >= timebool

Compares the times.

time < timebool

Compares the times.

time <= timebool

Compares the times.

time + intervaltime (commutative)

Adds the interval to the time.

time != timebool

Compares two time values.

5.2.5.9. List

Spicy uses lists only in a limited form as temporary values, usually for initializing other containers. That means you can only create list constants, but you cannot declare variables or unit fields to have a list type (use vector instead).

Constants

  • [E_1, E_2, ..., E_N] creates a list of N elements. The values E_I must all have the same type. [] creates an empty list of unknown element type.
  • [EXPR for ID in ITERABLE] creates a list by evaluating EXPR for all elements in ITERABLE, assembling the individual results into the final list value. The extended form [EXPR for ID in SEQUENCE if COND] includes only elements into the result for which COND evaluates to True. Both EXPR and COND can use ID to refer to the current element.
  • list(E_1, E_2, ..., E_N) is the same as [E_1, E_2, ..., E_N], and list() is the same as [].
  • list<T>(E_1, E_2, ..., E_N) creates a list of type T, initializing it with the N elements E_I. list<T>() creates an empty list.

Operators

begin(<container>)<iterator>

Returns an iterator to the beginning of the container’s content.

end(<container>)<iterator>

Returns an iterator to the end of the container’s content.

list == listbool

Compares two lists element-wise.

|list|uint<64>

Returns the number of elements a list contains.

list != listbool

Compares two lists element-wise.

5.2.5.10. Map

Maps are containers holding key/value pairs of elements, with fast lookup for keys to retrieve the corresponding value. They provide iterators to traverse their content, with no particular ordering.

Types

  • map<K, V> specifies a map with key type K and value type V.
  • iterator<map<K, V>>

Constants

  • map(K_1: V_1, K_2: V_2, ..., K_N: V_N) creates a map of N elements, initializing it with the given key/value pairs. The keys K_I must all have the same type, and the values V_I must likewise all have the same type. map() creates an empty map of unknown key/value types; this cannot be used directly but must be coerced into a fully-defined map type first.
  • map<K, V>(K_1: V_1, K_2: V_2, ..., K_N: V_N) creates a map of type map<K, V>, initializing it with the given key/value pairs. map<K, V>() creates an empty map.

Methods

clear()void

Removes all elements from the map.

get(key: <any>, [ default: <any> ])<type of element>

Returns the map’s element for the given key. If the key does not exist, returns the default value if provided; otherwise throws a runtime error.

Operators

begin(<container>)<iterator>

Returns an iterator to the beginning of the container’s content.

delete map[element]void

Removes an element from the map.

end(<container>)<iterator>

Returns an iterator to the end of the container’s content.

map == mapbool

Compares two maps element-wise.

<any> in mapbool

Returns true if an element is part of the map.

<any> !in mapbool

Performs the inverse of the corresponding in operation.

map[<any>]<type of element>

Returns the map’s element for the given key. The key must exist, otherwise the operation will throw a runtime error.

map[<any>]=<any>void

Updates the map value for a given key. If the key does not exist a new element is inserted.

|map|uint<64>

Returns the number of elements a map contains.

map != mapbool

Compares two maps element-wise.

Iterator Operators

*iterator<map><dereferenced type>

Returns the map element that the iterator refers to.

iterator<map> == iterator<map>bool

Returns true if two map iterators refer to the same location.

iterator<map>++iterator<map>

Advances the iterator by one map element, returning the previous position.

++iterator<map>iterator<map>

Advances the iterator by one map element, returning the new position.

iterator<map> != iterator<map>bool

Returns true if two map iterators refer to different locations.

5.2.5.11. Optional

An optional value may hold a value of another type, or can alternatively remain unset. A common use case for optional is the return value of a function that may fail.

  • optional<TYPE>

Constants

  • optional(EXPR) creates an optional<T>, where T is the type of the expression EXPR and initializes it with the value of EXPR.

More commonly, however, optional values are initialized through assigment:

  • Assigning an instance of TYPE to an optional<TYPE> sets it to the instance’s value.
  • Assigning Null to an optional<TYPE> unsets it.

Operators

*optional<dereferenced type>

Returns the element stored, or throws an exception if none.

5.2.5.12. Port

Ports represent the combination of a numerical port number and an associated transport-layer protocol.

Type

  • port

Constants

  • 443/tcp, 53/udp
  • port(PORT, PROTOCOL) creates a port where PORT is a port number and PROTOCOL a spicy::Protocol.

Methods

protocol()hilti::Protocol

Returns the protocol the port is using (such as UDP or TCP).

Operators

port == portbool

Compares two port values.

port != portbool

Compares two port values.

5.2.5.13. Real

“Real” values store floating points with double precision.

Type

  • real

Constants

  • 3.14, 10e9, 0x1.921fb78121fb8p+1

Operators

cast<int-type>(real)int

Converts the value to a signed integer type, accepting any loss of information.

cast<interval-type>(real)interval

Interprets the value as number of seconds.

cast<time-type>(real)time

Interprets the value as number of seconds since the UNIX epoch.

cast<uint-type>(real)uint

Converts the value to an unsigned integer type, accepting any loss of information.

real - realreal

Returns the difference between the two values.

real -= realreal

Subtracts the second value from the first, assigning the new value.

real / realreal

Divides the first value by the second.

real /= realreal

Divides the first value by the second, assigning the new value.

real == realbool

Compares the two reals.

real > realbool

Compares the two reals.

real >= realbool

Compares the two reals.

real < realbool

Compares the two reals.

real <= realbool

Compares the two reals.

real % realreal

Computes the modulus of the first real divided by the second.

real * realreal

Multiplies the first real by the second.

real *= realreal

Multiplies the first value by the second, assigning the new value.

real ** realreal

Computes the first real raised to the power of the second.

-realreal

Inverts the sign of the real.

real + realreal

Returns the sum of the reals.

real += realreal

Adds the first real to the second, assigning the new value.

real != realbool

Compares the two reals.

5.2.5.14. Regular Expression

Spicy provides POSIX-style regular expressions.

Type

  • regexp

Constants

  • /Foo*ba?r/, /X(..)(..)(..)Y/

Regular expressions use the extended POSIX syntax, with a few smaller differences and extensions:

  • Supported character classes are: [:lower:], [:upper:], [:digit:], [:blank:].
  • \b asserts a word-boundary, \B matches asserts no word boundary.
  • \xXX matches a byte with the binary hex value XX (e.g., \xff matches a byte of decimal value 255).
  • {#<number>} associates a numerical ID with a regular expression (useful for set matching).

Regular expression constants support two optional attributes:

&anchor
Implicitly anchor the expression, meaning it must match at the beginning of the data.
&nosub
Compile without support for capturing subexpressions, which makes matching more efficient.

Methods

find(data: bytes)tuple<int<32>, bytes>

Searches the regular expression in data and returns the matching part. Different from match, this does not anchor the expression to the beginning of the data: it will find matches at arbitrary starting positions. Returns a 2-tuple with (1) an integer match indicator with the same semantics as that returned by find; and (2) if a match has been found, the data that matches the regular expression. (Note: Currently this function has a runtime that’s quadratic in the size of data; consider using match if performance is an issue.)

match(data: bytes)int<32>

Matches the regular expression against data. If it matches, returns an integer that’s greater than zero. If multiple patterns have been compiled for parallel matching, that integer will be the ID of the matching pattern. Returns -1 if the regular expression does not match the data, but could still yield a match if more data were added. Returns 0 if the regular expression is not found and adding more data wouldn’t change anything. The expression is considered anchored, as though it starts with an implicit ^ regexp operator, to the beginning of the data.

match_groups(data: bytes)vector<bytes>

Matches the regular expression against data. If it matches, returns a vector with one entry for each capture group defined by the regular expression; starting at index 1. Each of these entries is a view locating the matching bytes. In addition, index 0 always contains the data that matches the full regular expression. Returns an empty vector if the expression is not found. The expression is considered anchored, as though it starts with an implicit ^ regexp operator, to the beginning of the data. This method is not compatible with pattern sets and will throw a runtime exception if used with a regular expression compiled from a set.

token_matcher()hilti::MatchState

Initializes state for matching regular expression incrementally against chunks of future input. The expression is considered anchored, as though it starts with an implicit ^ regexp operator, to the beginning of the data.

5.2.5.15. Set

Sets are containers for unique elements with fast lookup. They provide iterators to traverse their content, with no particular ordering.

Types

  • set<T> specifies a set with unique elements of type T.
  • iterator<set<T>>

Constants

  • set(E_1, E_2, ..., E_N) creates a set of N elements. The values E_I must all have the same type. set() creates an empty set of unknown element type; this cannot be used directly but must be coerced into a fully-defined set type first.
  • set<T>(E_1, E_2, ..., E_N) creates a set of type T, initializing it with the elements E_I. set<T>() creates an empty set.

Methods

clear()void

Removes all elements from the set.

Operators

add set[element]void

Adds an element to the set.

begin(<container>)<iterator>

Returns an iterator to the beginning of the container’s content.

delete set[element]void

Removes an element from the set.

end(<container>)<iterator>

Returns an iterator to the end of the container’s content.

set == setbool

Compares two sets element-wise.

<any> in setbool

Returns true if an element is part of the set.

<any> !in setbool

Performs the inverse of the corresponding in operation.

|set|uint<64>

Returns the number of elements a set contains.

set != setbool

Compares two sets element-wise.

Iterator Operators

*iterator<set><dereferenced type>

Returns the set element that the iterator refers to.

iterator<set> == iterator<set>bool

Returns true if two sets iterators refer to the same location.

iterator<set>++iterator<set>

Advances the iterator by one set element, returning the previous position.

++iterator<set>iterator<set>

Advances the iterator by one set element, returning the new position.

iterator<set> != iterator<set>bool

Returns true if two sets iterators refer to different locations.

5.2.5.16. Sink

Sinks act as a connector between two units, facilitating feeding the output of one as input into the other. See Sinks for a full description.

Sinks are special in that they don’t represent a type that’s generally available for instantiation. Instead they need to be declared as the member of unit using the special sink keyword. You can, however, maintain references to sinks by assigning the unit member to a variable of type Sink&.

Methods

close()void

Closes a sink by disconnecting all parsing units. Afterwards the sink’s state is as if it had just been created (so new units can be connected). Note that a sink is automatically closed when the unit it is part of is done parsing. Also note that a previously connected parsing unit can not be reconnected; trying to do so will still throw a UnitAlreadyConnected exception.

connect(u: strong_ref<unit>)void

Connects a parsing unit to a sink. All subsequent write operations to the sink will pass their data on to this parsing unit. Each unit can only be connected to a single sink. If the unit is already connected, a UnitAlreadyConnected exception is thrown. However, a sink can have more than one unit connected to it.

connect_filter(filter: strong_ref<unit>)void

Connects a filter unit to the sink that will transform its input transparently before forwarding it for parsing to other connected units.

Multiple filters can be added to a sink, in which case they will be chained into a pipeline and the data will be passed through them in the order they have been added. The parsing will then be carried out on the output of the last filter in the chain.

Filters must be added before the first data chunk is written into the sink. If data has already been written when a filter is added, an error is triggered.

connect_mime_type(inout mt: bytes)void

Connects parsing units to a sink for all parsers that support a given MIME type. All subsequent write operations to the sink will pass their data on to these parsing units. The MIME type may have wildcards for type or subtype, and the method will then connect units for all matching parsers.

connect_mime_type(mt: string)void

Connects parsing units to a sink for all parsers that support a given MIME type. All subsequent write operations to the sink will pass their data on to these parsing units. The MIME type may have wildcards for type or subtype, and the method will then connect units for all matching parsers.

gap(seq: uint<64>, len: uint<64>)void

Reports a gap in the input stream. seq is the sequence number of the first byte missing, len is the length of the gap.

sequence_number()uint<64>

Returns the current sequence number of the sink’s input stream, which is one beyond the index of the last byte that has been put in order and delivered so far.

set_auto_trim(enable: bool)void

Enables or disables auto-trimming. If enabled (which is the default) sink input data is trimmed automatically once in-order and processed. See trim() for more information about trimming.

set_initial_sequence_number(seq: uint<64>)void

Sets the sink’s initial sequence number. All sequence numbers given to other methods are then assumed to be absolute numbers beyond that initial number. If the initial number is not set, the sink implicitly uses zero instead.

set_policy(policy: enum)void

Sets a sink’s reassembly policy for ambiguous input. As long as data hasn’t been trimmed, a sink will detect overlapping chunks. This policy decides how to handle ambiguous overlaps. The default (and currently only) policy is ReassemblerPolicy::First, which resolves ambiguities by taking the data from the chunk that came first.

skip(seq: uint<64>)void

Skips ahead in the input stream. seq is the sequence number where to continue parsing. If there’s still data buffered before that position it will be ignored; if auto-skip is also active, it will be immediately deleted as well. If new data is passed in later that comes before seq, that will likewise be ignored. If the input stream is currently stuck inside a gap, and seq lies beyond that gap, the stream will resume processing at seq.

trim(seq: uint<64>)void

Deletes all data that’s still buffered internally up to seq. If processing the input stream hasn’t reached seq yet, parsing will also skip ahead to seq.

Trimming the input stream releases the memory, but that means that the sink won’t be able to detect any further data mismatches.

Note that by default, auto-trimming is enabled, which means all data is trimmed automatically once in-order and processed.

write(inout data: bytes, [ seq: uint<64> ], [ len: uint<64> ])void

Passes data on to all connected parsing units. Multiple write calls act like passing input in incrementally: The units will parse the pieces as if they were a single stream of data. If no sequence number seq is provided, the data is assumed to represent a chunk to be appended to the current end of the input stream. If a sequence number is provided, out-of-order data will be buffered and reassembled before being passed on. If len is provided, the data is assumed to represent that many bytes inside the sequence space; if not provided, len defaults to the length of data.

If no units are connected, the call does not have any effect. If multiple units are connected and one parsing unit throws an exception, parsing of subsequent units does not proceed. Note that the order in which the data is parsed to each unit is undefined.

Todo

The error semantics for multiple units aren’t great.

Operators

|sink|uint<64>

Returns the number of bytes written into the sink so far. If the sink has filters attached, this returns the value after filtering.

|strong_ref<sink>|uint<64>

Returns the number of bytes written into the referenced sink so far. If the sink has filters attached, this returns the value after filtering.

Sinks provide a set of dedicated unit hooks as callbacks for the reassembly process. These must be implemented on the reader side, i.e., the unit that’s connected to a sink.

%on_gap(seq: uint64, len: uint64)
%on_overlap(seq: uint64, old: data, new: data)

Triggered when reassembly encounters a 2nd version of data for sequence space already covered earlier. seq is the start of the overlap, and old/new the previous and the new data, respectively. This hook is just for informational purposes, the policy set with set_policy() determines how the reassembler handles the overlap.

%on_skipped(seq: uint64)

Any time skip() moves ahead in the input stream, this hook reports the new sequence number seq.

%on_skipped(seq: uint64, data: bytes)

If data still buffered is skipped over through skip(), it will be passed to this hook, before adjusting the current position. seq is the starting sequence number of the data, data is the data itself.

5.2.5.17. Stream

A stream is data structure that efficiently represents a potentially large, incrementally provided input stream of raw data. You can think of it as a bytes type that’s optimized for (1) efficiently appending new chunks of data at the end, and (2) trimming data no longer needed at the beginning. Other than those two operation, stream data cannot be modified; there’s no way to change the actual content of a stream once it has been added to it. Streams provide iterators for traversal, and views for limiting visibility to smaller windows into the total stream.

Streams are key to Spicy’s parsing process, although most of that happens behind the scenes. You will most likely encounter them when using Random access. They may also be useful for buffering larger volumes of data during processing.

Types

  • stream
  • iterator<stream>
  • view<stream>

Methods

at(i: uint<64>)iterator<stream>

Returns an iterator representing the offset i inside the stream value.

freeze()void

Freezes the stream value. Once frozen, one cannot append any more data to a frozen stream value (unless it gets unfrozen first). If the value is already frozen, the operation does not change anything.

is_frozen()bool

Returns true if the stream value has been frozen.

trim(inout i: iterator<stream>)void

Trims the stream value by removing all data from its beginning up to (but not including) the position i. The iterator i will remain valid afterwards and will still point to the same location, which will now be the beginning of the stream’s value. All existing iterators pointing to i or beyond will remain valid and keep their offsets as well. The effect of this operation is undefined if i does not actually refer to a location inside the stream value. Trimming is permitted even on frozen values.

unfreeze()void

Unfreezes the stream value. A unfrozen stream value can be further modified. If the value is already unfrozen (which is the default), the operation does not change anything.

Operators

begin(<container>)<iterator>

Returns an iterator to the beginning of the container’s content.

end(<container>)<iterator>

Returns an iterator to the end of the container’s content.

|stream|uint<64>

Returns the number of stream the value contains.

stream += bytesstream

Concatenates data to the stream.

stream += view<stream>stream

Concatenates another stream’s view to the target stream.

stream != streambool

Compares two stream values lexicographically.

Iterator Methods

is_frozen()bool

Returns whether the stream value that the iterator refers to has been frozen.

offset()uint<64>

Returns the offset of the byte that the iterator refers to relative to the beginning of the underlying stream value.

Iterator Operators

*iterator<stream>uint<64>

Returns the character the iterator is pointing to.

iterator<stream> - iterator<stream>int<64>

Returns the number of stream between the two iterators. The result will be negative if the second iterator points to a location before the first. The result is undefined if the iterators do not refer to the same stream instance.

iterator<stream> == iterator<stream>bool

Compares the two positions. The result is undefined if they are not referring to the same stream value.

iterator<stream> > iterator<stream>bool

Compares the two positions. The result is undefined if they are not referring to the same stream value.

iterator<stream> >= iterator<stream>bool

Compares the two positions. The result is undefined if they are not referring to the same stream value.

iterator<stream>++iterator<stream>

Advances the iterator by one byte, returning the previous position.

++iterator<stream>iterator<stream>

Advances the iterator by one byte, returning the new position.

iterator<stream> < iterator<stream>bool

Compares the two positions. The result is undefined if they are not referring to the same stream value.

iterator<stream> <= iterator<stream>bool

Compares the two positions. The result is undefined if they are not referring to the same stream value.

iterator<stream> + uint<64>iterator<stream> (commutative)

Advances the iterator by the given number of stream.

iterator<stream> += uint<64>iterator<stream>

Advances the iterator by the given number of stream.

iterator<stream> != iterator<stream>bool

Compares the two positions. The result is undefined if they are not referring to the same stream value.

View Methods

advance(i: uint<64>)view<stream>

Advances the view’s starting position by i stream, returning the new view.

advance(inout i: iterator<stream>)view<stream>

Advances the view’s starting position to a given iterator i, returning the new view. The iterator must be referring to the same stream values as the view, and it must be equal or ahead of the view’s starting position.

at(i: uint<64>)iterator<stream>

Returns an iterator representing the offset i inside the view.

find(needle: bytes)tuple<bool, iterator<stream>>

Searches needle inside the view’s content. Returns a tuple of a boolean and an iterator. If needle was found, the boolean will be true and the iterator will point to its first occurance. If needle was not found, the boolean will be false and the iterator will point to the last position so that everything before that is guaranteed to not contain even a partial match of needle (in other words: one can trim until that position and then restart the search from there if more data gets appended to the underlying stream value). Note that for a simple yes/no result, you should use the in operator instead of this method, as it’s more efficient.

limit(i: uint<64>)view<stream>

Returns a new view that keeps the current start but cuts off the end i characters from that beginning. The returned view will not be able to expand any further.

offset()uint<64>

Returns the offset of the view’s starting position within the associated stream value.

starts_with(b: bytes)bool

Returns true if the view starts with b.

sub(begin: uint<64>, end: uint<64>)view<stream>

Returns a new view of the subsequence from offset begin to (but not including) offset end. The offsets are relative to the beginning of the view.

sub(inout begin: iterator<stream>, inout end: iterator<stream>)view<stream>

Returns a new view of the subsequence from begin up to (but not including) end.

sub(inout end: iterator<stream>)view<stream>

Returns a new view of the subsequence from the beginning of the stream up to (but not including) end.

View Operators

view<stream> == bytesbool (commutative)

Compares a stream view and a bytes intances lexicographically.

view<stream> == view<stream>bool

Compares the views lexicographically.

bytes in view<stream>bool

Returns true if the right-hand-side bytes contains the left-hand-side view as a subsequence.

view<stream> in bytesbool

Returns true if the right-hand-side view contains the left-hand-side bytes as a subsequence.

bytes !in view<stream>bool

Performs the inverse of the corresponding in operation.

view<stream> !in bytesbool

Performs the inverse of the corresponding in operation.

|view<stream>|uint<64>

Returns the number of stream the view contains.

view<stream> != bytesbool (commutative)

Compares a stream view and a bytes instance lexicographically.

view<stream> != view<stream>bool

Compares two views lexicographically.

5.2.5.18. String

Strings store readable text that’s associated with a given character set. Internally, Spicy stores them as UTF-8.

Type

  • string

Constants

  • "Spicy", ""
  • When specifying string constants, Spicy assumes them to be in UTF-8.

Methods

encode(charset: enum = hilti::Charset::UTF8)bytes

Converts the string into a binary representation encoded with the given character set.

Operators

string == stringbool

Compares two strings lexicographically.

string % <any>string

Renders a printf-style format string.

|string|uint<64>

Returns the number of characters the string contains.

string + stringstring

Returns the concatenation of two strings.

string != stringbool

Compares two strings lexicographically.

5.2.5.19. Time

A time value refers to a specific, absolute point of time, specified as the interval from January 1, 1970 UT ( i.e., the Unix epoch). Times are stored with nanosecond resolution, which is retained across all calculations.

Type

  • time

Constants

  • time(SECS) creates a time from an unsigned integer or real value SECS specifying seconds since the epoch.
  • time_ns(NSECS) creates a time from an unsigned integer value NSECS specifying nanoseconds since the epoch.

Methods

nanoseconds()uint<64>

Returns the time as an integer value representing nanoseconds since the UNIX epoch.

seconds()real

Returns the time as a real value representing seconds since the UNIX epoch.

Operators

time - timeinterval

Returns the difference of the times.

time - intervaltime

Subtracts the interval from the time.

time == timebool

Compares two time values.

time > timebool

Compares the times.

time >= timebool

Compares the times.

time < timebool

Compares the times.

time <= timebool

Compares the times.

time + intervaltime (commutative)

Adds the interval to the time.

time != timebool

Compares two time values.

5.2.5.20. Tuple

Tuples are heterogeneous containers of a fixed, ordered set of types. Tuple elements may optionally be declared and addressed with custom identifier names.

Type

  • tuple<[IDENTIFIER_1: ]TYPE_1, ...[IDENTIFIER_N: ]TYPE_N>

Constants

  • (1, "string", True), (1, ), ()
  • tuple(1, "string", True), tuple(1), tuple()

Operators

tuple == tuplebool

Compares two tuples element-wise.

tuple[uint<64>]<type of element>

Extracts the tuple element at the given index. The index must be a constant unsigned integer.

tuple . <id><type of element>

Extracts the tuple element corresponding to the given ID.

tuple != tuplebool

Compares two tuples element-wise.

5.2.5.21. Unit

Type

  • unit { FIELD_1; ...; FIELD_N }
  • See Parsing for a full discussion of unit types.

Constants

  • Spicy doesn’t support unit constants, but you can initialize unit instances through coercion from a list expression: my_unit = [$FIELD_1 = X_1, $FIELD_N = X_N, ...] where FIELD_I is the label of a corresponding field in my_unit’s type.

Methods

backtrack()void

Aborts parsing at the current position and returns back to the most recent &try attribute. Turns into a parse error if there’s no &try in scope.

connect_filter(filter: strong_ref<unit>)void

Connects a separate filter unit to transform the unit’s input transparently before parsing. The filter unit will see the original input, and this unit will receive everything the filter passes on through forward().

Filters can be connected only before a unit’s parsing begins. The latest possible point is from inside the target unit’s %init hook.

context()<context>&

Returns a reference to the %context instance associated with the unit.

find(needle: bytes, [ dir: enum ], [ start: iterator<stream> ])optional<iterator<stream>>

Searches a needle pattern inside the input region defined by where the unit began parsing and its current parsing position. If executed from inside a field hook, the current parasing position will represent the first byte that the field has been parsed from. By default, the search will start at the beginning of that region and scan forward. If the direction is spicy::Direcction::Backward, the search will start at the end of the region and scan backward. In either case, a starting position can also be explicitly given, but must lie inside the same region.

Usage of this method requires the unit to be declared with the %random-access property.

forward(inout data: bytes)void

If the unit is connected as a filter to another one, this method forwards transformed input over to that other one to parse. If the unit is not connected, this method will silently discard the data.

forward_eod()void

If the unit is connected as a filter to another one, this method signals that other one that end of its input has been reached. If the unit is not connected, this method will not do anything.

input()iterator<stream>

Returns an iterator referring to the input location where the current unit has begun parsing. If this method is called before the units parsing has begun, it will throw a runtime exception. Once available, the input position will remain accessible for the unit’s entire life time.

Usage of this method requires the unit to be declared with the %random-access property.

offset()uint<64>

Returns the offset of the current location in the input stream relative to the unit’s start. If executed from inside a field hook, the offset will represent the first byte that the field has been parsed from. If this method is called before the unit’s parsing has begun, it will throw a runtime exception. Once parsing has started, the offset will remain available for the unit’s entire life time.

Usage of this method requires the unit to be declared with the %random-access property.

position()iterator<stream>

Returns an iterator to the current position in the unit’s input stream. If executed from inside a field hook, the position will represent the first byte that the field has been parsed from. If this method is called before the unit’s parsing has begun, it will throw a runtime exception.

Usage of this method requires the unit to be declared with the %random-access property.

set_input(i: iterator<stream>)void

Moves the current parsing position to i. The iterator i must be into the input of the current unit, or the method will throw a runtime execption.

Usage of this method requires the unit to be declared with the %random-access property.

Operators

unit ?. <field>bool

Returns true if the unit’s field has a value assigned (not counting any &default).

unit . <field><field type>

Retrieves the value of a unit’s field. If the field does not yet have a value assigned, it returns its &default expression if that has been defined; otherwise it triggers an exception.

unit .? <field><field type>

Retrieves the value of a unit’s field. If the field does not yet have a value assigned, it returns its &default expression if that has been defined. Otherwise it triggers an exception, unless used in a context that specifically allows for that situation (such as, inside the Zeek plugin’s evt files).

unset unit.<field>void

Resets a field back to its original uninitialized state.

5.2.5.22. Vector

Vectors are homogeneous containers, holding a set of elements of a given element type. They provide iterators to traverse their content.

Types

  • vector<T> specifies a vector with elements of type T.
  • iterator<vector<T>>

Constants

  • vector(E_1, E_2, ..., E_N) creates a vector of N elements. The values E_I must all have the same type. vector() creates an empty vector of unknown element type; this cannot be used directly but must be coerced into a fully-defined vector type first.
  • vector<T>(E_1, E_2, ..., E_N) creates a vector of type T, initializing it with the N elements E_I. vector<T>() creates an empty vector.
  • Vectors can be initialized through coercion from a list value: vector<string> I = ["A", "B", "C"].

Methods

assign(i: uint<64>, x: <any>)void

Assigns x to the i*th element of the vector. If the vector contains less than *i elements a sufficient number of default-initialized elements is added to carry out the assignment.

at(i: uint<64>)<iterator>

Returns an iterator referring to the element at vector index i.

back()<type of element>

Returns the last element of the vector. It throws an exception if the vector is empty.

front()<type of element>

Returns the first element of the vector. It throws an exception if the vector is empty.

pop_back()void

Removes the last element from the vector, which must be non-empty.

push_back(x: <any>)void

Appends x to the end of the vector.

reserve(n: uint<64>)void

Reserves space for at least n elements. This operation does not change the vector in any observable way but provides a hint about the size that will be needed.

resize(n: uint<64>)void

Resizes the vector to hold exactly n elements. If n is larger than the current size, the new slots are filled with default values. If n is smaller than the current size, the excessive elements are removed.

sub(begin: uint<64>, end: uint<64>)vector

Extracts a subsequence of vector elements spanning from index begin to (but not including) index end.

sub(end: uint<64>)vector

Extracts a subsequence of vector elements spanning from the beginning to (but not including) the index end as a new vector.

Operators

begin(<container>)<iterator>

Returns an iterator to the beginning of the container’s content.

end(<container>)<iterator>

Returns an iterator to the end of the container’s content.

vector == vectorbool

Compares two vectors element-wise.

vector[uint<64>]<type of element>

Returns the vector element at the given index.

|vector|uint<64>

Returns the number of elements a vector contains.

vector + vectorvector

Returns the concatenation of two vectors.

vector += vectorvector

Concatenates another vector to the vector.

vector != vectorbool

Compares two vectors element-wise.

Iterator Operators

*iterator<vector><dereferenced type>

Returns the vector element that the iterator refers to.

iterator<vector> == iterator<vector>bool

Returns true if two vector iterators refer to the same location.

iterator<vector>++iterator<vector>

Advances the iterator by one vector element, returning the previous position.

++iterator<vector>iterator<vector>

Advances the iterator by one vector element, returning the new position.

iterator<vector> != iterator<vector>bool

Returns true if two vector iterators refer to different locations.

5.2.5.23. Void

The void type is place holder for specifying “no type”, such as when a function doesn’t return anything.

Type

  • void