5.2.5. Types

5.2.5.1. Address

The address type stores both IPv4 and IPv6 addresses.

Type

  • addr

Constants

  • IPv4: 1.2.3.4

  • IPv6: [2001:db8:85a3:8d3:1319:8a2e:370:7348], [::1.2.3.4]

This type supports the pack/unpack operators.

Methods

family() spicy::AddressFamily

Returns the protocol family of the address, which can be IPv4 or IPv6.

Operators

addr == addr bool

Compares two address values.

addr != addr bool

Compares two address values.

5.2.5.2. Bitfield

Bitfields provide access to individual bitranges inside an unsigned integer. That can’t be instantiated directly, but must be defined and parsed inside a unit.

Type

  • bitfield(N) { RANGE_1; ...; RANGE_N }

  • Each RANGE has one of the forms LABEL: A or LABEL: A..B where A and B are bit numbers.

Constants

  • bitfield(N) { RANGE_1 [= VALUE_1]; ...; RANGE_N [= VALUE_N] }

A bitfield constant represents expected values for all or some of the individual bitranges. They can be used only for parsing inside a unit field, not as values to otherwise operate with. To define such a constant with expected values, add = VALUE to the bitranges inside the type definition as suitable (with VALUE representing the final value after applying any &bit-order attribute, if present). See Bitfield for more information.

Operators

<bitfield> ?. <name> bool

Returns true if the bitfield’s element has a value.

<bitfield> . <name> <field type>

Retrieves the value of a bitfield’s attribute. This is the value of the corresponding bits inside the underlying integer value, shifted to the very right.

5.2.5.3. Bool

Boolean values can be True or False.

Type

  • bool

Constants

  • True, False

Operators

bool & bool bool

Computes the bit-wise ‘and’ of the two boolean values.

bool | bool bool

Computes the bit-wise ‘or’ of the two boolean values.

bool ^ bool bool

Computes the bit-wise ‘xor’ of the two boolean values.

bool == bool bool

Compares two boolean values.

bool != bool bool

Compares two boolean values.

5.2.5.4. Bytes

Bytes instances store raw, opaque data. They provide iterators to traverse their content.

Types

  • bytes

  • iterator<bytes>

Constants

  • b"Spicy", b""

Methods

at(i: uint<64>) iterator<bytes>

Returns an iterator representing the offset i inside the bytes value.

decode([ charset: spicy::Charset = hilti::Charset::UTF8 ], [ errors: spicy::DecodeErrorStrategy = hilti::DecodeErrorStrategy::REPLACE ]) string

Interprets the bytes as representing an binary string encoded with the given character set, and converts it into a UTF8 string. If data is encountered that charset or UTF* cannot represent, it’s handled according to the errors strategy.

find(needle: bytes) tuple<bool, iterator<bytes>>

Searches needle in the value’s content. Returns a tuple of a boolean and an iterator. If needle was found, the boolean will be true and the iterator will point to its first occurrence. If needle was not found, the boolean will be false and the iterator will point to the last position so that everything before it is guaranteed to not contain even a partial match of needle. Note that for a simple yes/no result, you should use the in operator instead of this method, as it’s more efficient.

join(parts: vector) bytes

Returns the concatenation of all elements in the parts list rendered as printable strings. The portions will be separated by the bytes value to which this method is invoked as a member.

lower([ charset: spicy::Charset = hilti::Charset::UTF8 ], [ errors: spicy::DecodeErrorStrategy = hilti::DecodeErrorStrategy::REPLACE ]) bytes

Returns a lower-case version of the bytes value, assuming it is encoded in character set charset. If data is encountered that charset cannot represent, it’s handled according to the errors strategy.

match(regex: regexp, [ group: uint<64> ]) result<bytes>

Matches the bytes object against the regular expression regex. Returns the matching part or, if group is given, then the corresponding subgroup. The expression is considered anchored to the beginning of the data.

split([ sep: bytes ]) vector<bytes>

Splits the bytes value at each occurrence of sep and returns a vector containing the individual pieces, with all separators removed. If the separator is not found, the returned vector will have the whole bytes value as its single element. If the separator is not given, or empty, the split will take place at sequences of white spaces.

split1([ sep: bytes ]) tuple<bytes, bytes>

Splits the bytes value at the first occurrence of sep and returns the two parts as a 2-tuple, with the separator removed. If the separator is not found, the returned tuple will have the whole bytes value as its first element and an empty value as its second element. If the separator is not given, or empty, the split will take place at the first sequence of white spaces.

starts_with(b: bytes) bool

Returns true if the bytes value starts with b.

strip([ side: spicy::Side ], [ set: bytes ]) bytes

Removes leading and/or trailing sequences of all characters in set from the bytes value. If set is not given, removes all white spaces. If side is given, it indicates which side of the value should be stripped; Side::Both is the default if not given.

sub(begin: iterator<bytes>, end: iterator<bytes>) bytes

Returns the subsequence from begin to (but not including) end.

sub(begin: uint<64>, end: uint<64>) bytes

Returns the subsequence from offset begin to (but not including) offset end.

sub(end: iterator<bytes>) bytes

Returns the subsequence from the value’s beginning to (but not including) end.

to_int([ base: uint<64> ]) int<64>

Interprets the data as representing an ASCII-encoded number and converts that into a signed integer, using a base of base. base must be between 2 and 36. If base is not given, the default is 10.

to_int(byte_order: spicy::ByteOrder) int<64>

Interprets the bytes as representing an binary number encoded with the given byte order, and converts it into signed integer.

to_time([ base: uint<64> ]) time

Interprets the bytes as representing a number of seconds since the epoch in the form of an ASCII-encoded number, and converts it into a time value using a base of base. If base is not given, the default is 10.

to_time(byte_order: spicy::ByteOrder) time

Interprets the bytes as representing as number of seconds since the epoch in the form of an binary number encoded with the given byte order, and converts it into a time value.

to_uint([ base: uint<64> ]) uint<64>

Interprets the data as representing an ASCII-encoded number and converts that into an unsigned integer, using a base of base. base must be between 2 and 36. If base is not given, the default is 10.

to_uint(byte_order: spicy::ByteOrder) uint<64>

Interprets the bytes as representing an binary number encoded with the given byte order, and converts it into an unsigned integer.

upper([ charset: spicy::Charset = hilti::Charset::UTF8 ], [ errors: spicy::DecodeErrorStrategy = hilti::DecodeErrorStrategy::REPLACE ]) bytes

Returns an upper-case version of the bytes value, assuming it is encoded in character set charset. If data is encountered that charset cannot represent, it’s handled according to the errors strategy.

Operators

begin(<container>) <iterator>

Returns an iterator to the beginning of the container’s content.

end(<container>) <iterator>

Returns an iterator to the end of the container’s content.

bytes == bytes bool

Compares two bytes values lexicographically.

bytes > bytes bool

Compares two bytes values lexicographically.

bytes >= bytes bool

Compares two bytes values lexicographically.

bytes in bytes bool

Returns true if the right-hand-side value contains the left-hand-side value as a subsequence.

bytes !in bytes bool

Performs the inverse of the corresponding in operation.

bytes < bytes bool

Compares two bytes values lexicographically.

bytes <= bytes bool

Compares two bytes values lexicographically.

|bytes| uint<64>

Returns the number of bytes the value contains.

bytes + bytes bytes

Returns the concatenation of two bytes values.

bytes += bytes bytes

Appends one bytes value to another.

bytes += uint<8> bytes

Appends a single byte to the data.

bytes += view<stream> bytes

Appends a view of stream data to a bytes instance.

bytes != bytes bool

Compares two bytes values lexicographically.

Iterator Operators

*iterator<bytes> uint<8>

Returns the character the iterator is pointing to.

iterator<bytes> - iterator<bytes> int<64>

Returns the number of bytes between the two iterators. The result will be negative if the second iterator points to a location before the first. The result is undefined if the iterators do not refer to the same bytes instance.

iterator<bytes> == iterator<bytes> bool

Compares the two positions. The result is undefined if they are not referring to the same bytes value.

iterator<bytes> > iterator<bytes> bool

Compares the two positions. The result is undefined if they are not referring to the same bytes value.

iterator<bytes> >= iterator<bytes> bool

Compares the two positions. The result is undefined if they are not referring to the same bytes value.

iterator<bytes>++ iterator<bytes>

Advances the iterator by one byte, returning the previous position.

++iterator<bytes> iterator<bytes>

Advances the iterator by one byte, returning the new position.

iterator<bytes> < iterator<bytes> bool

Compares the two positions. The result is undefined if they are not referring to the same bytes value.

iterator<bytes> <= iterator<bytes> bool

Compares the two positions. The result is undefined if they are not referring to the same bytes value.

iterator<bytes> + uint<64> iterator<bytes> (commutative)

Returns an iterator which is pointing the given number of bytes beyond the one passed in.

iterator<bytes> += uint<64> iterator<bytes>

Advances the iterator by the given number of bytes.

iterator<bytes> != iterator<bytes> bool

Compares the two positions. The result is undefined if they are not referring to the same bytes value.

5.2.5.5. Enum

Enum types associate labels with numerical values.

Type

  • enum { LABEL_1, ..., LABEL_N }

  • Each label has the form ID [= VALUE]. If VALUE is skipped, one will be assigned automatically.

  • Each enum type comes with an implicitly defined Undef label with a value distinct from all other ones. When coerced into a boolean, an enum will be true iff it’s not Undef.

Note

An instance of an enum can assume a numerical value that does not map to any of its defined labels. If printed, it will then render into <unknown-N> in that case, with N being the decimal expression of its numeric value.

Constants

  • The individual labels represent constants of the corresponding type (e.g., MyEnum::MyFirstLabel is a constant of type MyEnum).

Methods

has_label() bool

Returns true if the value of op1 corresponds to a known enum label (other than Undef), as defined by it’s type.

Operators

enum(int) enum value

Instantiates an enum instance initialized from a signed integer value. The value does not need to correspond to any of the type’s enumerator labels.

enum(uint) enum value

Instantiates an enum instance initialized from an unsigned integer value. The value does not need to correspond to any of the type’s enumerator labels. It must not be larger than the maximum that a signed 64-bit integer value can represent.

cast<int>(enum) int

Casts an enum value into a signed integer. If the enum value is Undef, this will return -1.

cast<uint>(enum) uint

Casts an enum value into a unsigned integer. This will throw an exception if the enum value is Undef.

enum == enum bool

Compares two enum values.

enum != enum bool

Compares two enum values.

5.2.5.6. Exception

Todo

This isn’t available in Spicy yet (#89).

5.2.5.7. Integer

Spicy distinguishes between signed and unsigned integers, and always requires specifying the bitwidth of a type.

Type

  • intN for signed integers, where N can be one of 8, 16, 32, 64.

  • uintN for unsigned integers, where N can be one of 8, 16, 32, 64.

Constants

  • Unsigned integer: 1234, +1234, uint8(42), uint16(42), uint32(42), uint64(42)

  • Signed integer: -1234, int8(42), int8(-42), int16(42), int32(42), int64(42)

This type supports the pack/unpack operators.

Operators

uint & uint uint

Computes the bit-wise ‘and’ of the two integers.

uint | uint uint

Computes the bit-wise ‘or’ of the two integers.

uint ^ uint uint

Computes the bit-wise ‘xor’ of the two integers.

int16(int) int<16>

Creates a 16-bit signed integer value.

int16(uint) int<16>

Creates a 16-bit signed integer value.

int32(int) int<32>

Creates a 32-bit signed integer value.

int32(uint) int<32>

Creates a 32-bit signed integer value.

int64(int) int<64>

Creates a 64-bit signed integer value.

int64(uint) int<64>

Creates a 64-bit signed integer value.

int8(int) int<8>

Creates a 8-bit signed integer value.

int8(uint) int<8>

Creates a 8-bit signed integer value.

uint16(int) uint<16>

Creates a 16-bit unsigned integer value.

uint16(uint) uint<16>

Creates a 16-bit unsigned integer value.

uint32(int) uint<32>

Creates a 32-bit unsigned integer value.

uint32(uint) uint<32>

Creates a 32-bit unsigned integer value.

uint64(int) uint<64>

Creates a 64-bit unsigned integer value.

uint64(uint) uint<64>

Creates a 64-bit unsigned integer value.

uint8(int) uint<8>

Creates a 8-bit unsigned integer value.

uint8(uint) uint<8>

Creates a 8-bit unsigned integer value.

cast<bool>(int) bool

Converts the value to a boolean by comparing against zero

cast<bool>(uint) bool

Converts the value to a boolean by comparing against zero

cast<enum>(int) enum

Converts the value into an enum instance. The value does not need to correspond to any of the target type’s enumerator labels.

cast<enum>(uint) enum

Converts the value into an enum instance. The value does not need to correspond to any of the target type’s enumerator labels.

cast<int>(int) int

Converts the value into a different signed integer type, accepting any loss of information.

cast<int>(uint) int

Converts the value into a signed integer type, accepting any loss of information.

cast<interval>(int) interval

Interprets the value as number of seconds.

cast<interval>(uint) interval

Interprets the value as number of seconds.

cast<real>(int) real

Converts the value into a real, accepting any loss of information.

cast<real>(uint) real

Converts the value into a real, accepting any loss of information.

cast<time>(uint) time

Interprets the value as number of seconds.

cast<uint>(int) uint

Converts the value into an unsigned integer type, accepting any loss of information.

cast<uint>(uint) uint

Converts the value into a different unsigned integer type, accepting any loss of information.

int-- int

Decrements the value, returning the old value.

uint-- uint

Decrements the value, returning the old value.

++int int

Increments the value, returning the new value.

++uint uint

Increments the value, returning the new value.

int - int int

Computes the difference between the two integers.

uint - uint uint

Computes the difference between the two integers.

int -= int int

Decrements the first value by the second, assigning the new value.

uint -= uint uint

Decrements the first value by the second, assigning the new value.

int / int int

Divides the first integer by the second.

uint / uint uint

Divides the first integer by the second.

int /= int int

Divides the first value by the second, assigning the new value.

uint /= uint uint

Divides the first value by the second, assigning the new value.

int == int bool

Compares the two integers.

uint == uint bool

Compares the two integers.

int > int bool

Compares the two integers.

uint > uint bool

Compares the two integers.

int >= int bool

Compares the two integers.

uint >= uint bool

Compares the two integers.

int++ int

Increments the value, returning the old value.

uint++ uint

Increments the value, returning the old value.

++int int

Increments the value, returning the new value.

++uint uint

Increments the value, returning the new value.

int < int bool

Compares the two integers.

uint < uint bool

Compares the two integers.

int <= int bool

Compares the two integers.

uint <= uint bool

Compares the two integers.

int % int int

Computes the modulus of the first integer divided by the second.

uint % uint uint

Computes the modulus of the first integer divided by the second.

int * int int

Multiplies the first integer by the second.

uint * uint uint

Multiplies the first integer by the second.

int *= int int

Multiplies the first value by the second, assigning the new value.

uint *= uint uint

Multiplies the first value by the second, assigning the new value.

~uint uint

Computes the bit-wise negation of the integer.

int ** int int

Computes the first integer raised to the power of the second.

uint ** uint uint

Computes the first integer raised to the power of the second.

uint << uint uint

Shifts the integer to the left by the given number of bits.

uint >> uint uint

Shifts the integer to the right by the given number of bits.

-int int

Inverts the sign of the integer.

-uint uint

Inverts the sign of the integer.

int + int int

Computes the sum of the integers.

uint + uint uint

Computes the sum of the integers.

int += int int

Increments the first integer by the second.

uint += uint uint

Increments the first integer by the second.

int != int bool

Compares the two integers.

uint != uint bool

Compares the two integers.

5.2.5.8. Interval

Am interval value represents a period of time. Intervals are stored with nanosecond resolution, which is retained across all calculations.

Type

  • interval

Constants

  • interval(SECS) creates an interval from a signed integer or real value SECS specifying the period in seconds.

  • interval_ns(NSECS) creates an interval from a signed integer value NSECS specifying the period in nanoseconds.

Methods

nanoseconds() uint<64>

Returns the time as an integer value representing nanoseconds since the UNIX epoch.

seconds() real

Returns the time as a real value representing seconds since the UNIX epoch.

Operators

time(int) time

Creates an time interpreting the argument as number of seconds.

time(real) time

Creates an time interpreting the argument as number of seconds.

time(uint) time

Creates an time interpreting the argument as number of seconds.

time_ns(int) time

Creates an time interpreting the argument as number of nanoseconds.

time_ns(uint) time

Creates an time interpreting the argument as number of nanoseconds.

time - time interval

Returns the difference of the times.

time - interval time

Subtracts the interval from the time.

time == time bool

Compares two time values.

time > time bool

Compares the times.

time >= time bool

Compares the times.

time < time bool

Compares the times.

time <= time bool

Compares the times.

time + interval time (commutative)

Adds the interval to the time.

time != time bool

Compares two time values.

5.2.5.9. List

Spicy uses lists only in a limited form as temporary values, usually for initializing other containers. That means you can only create list constants, but you cannot declare variables or unit fields to have a list type (use vector instead).

Constants

  • [E_1, E_2, ..., E_N] creates a list of N elements. The values E_I must all have the same type. [] creates an empty list of unknown element type.

  • [EXPR for ID in ITERABLE] creates a list by evaluating EXPR for all elements in ITERABLE, assembling the individual results into the final list value. The extended form [EXPR for ID in SEQUENCE if COND] includes only elements into the result for which COND evaluates to True. Both EXPR and COND can use ID to refer to the current element.

  • list(E_1, E_2, ..., E_N) is the same as [E_1, E_2, ..., E_N], and list() is the same as [].

  • list<T>(E_1, E_2, ..., E_N) creates a list of type T, initializing it with the N elements E_I. list<T>() creates an empty list.

Operators

begin(<container>) <iterator>

Returns an iterator to the beginning of the container’s content.

end(<container>) <iterator>

Returns an iterator to the end of the container’s content.

list == list bool

Compares two lists element-wise.

|list| uint<64>

Returns the number of elements a list contains.

list != list bool

Compares two lists element-wise.

5.2.5.10. Map

Maps are containers holding key/value pairs of elements, with fast lookup for keys to retrieve the corresponding value. They provide iterators to traverse their content, with no particular ordering.

Types

  • map<K, V> specifies a map with key type K and value type V.

  • iterator<map<K, V>>

Constants

  • map(K_1: V_1, K_2: V_2, ..., K_N: V_N) creates a map of N elements, initializing it with the given key/value pairs. The keys K_I must all have the same type, and the values V_I must likewise all have the same type. map() creates an empty map of unknown key/value types; this cannot be used directly but must be coerced into a fully-defined map type first.

  • map<K, V>(K_1: V_1, K_2: V_2, ..., K_N: V_N) creates a map of type map<K, V>, initializing it with the given key/value pairs. map<K, V>() creates an empty map.

Methods

clear() void

Removes all elements from the map.

get(key: <any>, [ default: <any> ]) <type of element>

Returns the map’s element for the given key. If the key does not exist, returns the default value if provided; otherwise throws a runtime error.

Operators

begin(<container>) <iterator>

Returns an iterator to the beginning of the container’s content.

delete map[<any>] void

Removes an element from the map.

end(<container>) <iterator>

Returns an iterator to the end of the container’s content.

map == map bool

Compares two maps element-wise.

<any> in map bool

Returns true if an element is part of the map.

<any> !in map bool

Performs the inverse of the corresponding in operation.

map[<any>] <type of element>

Returns the map’s element for the given key. The key must exist, otherwise the operation will throw a runtime error.

map[<any>]=<any> void

Updates the map value for a given key. If the key does not exist a new element is inserted.

|map| uint<64>

Returns the number of elements a map contains.

map != map bool

Compares two maps element-wise.

Iterator Operators

*iterator<map> <dereferenced type>

Returns the map element that the iterator refers to.

iterator<map> == iterator<map> bool

Returns true if two map iterators refer to the same location.

iterator<map>++ iterator<map>

Advances the iterator by one map element, returning the previous position.

++iterator<map> iterator<map>

Advances the iterator by one map element, returning the new position.

iterator<map> != iterator<map> bool

Returns true if two map iterators refer to different locations.

5.2.5.11. Optional

An optional value may hold a value of another type, or can alternatively remain unset. A common use case for optional is the return value of a function that may fail.

  • optional<TYPE>

Constants

  • optional(EXPR) creates an optional<T>, where T is the type of the expression EXPR and initializes it with the value of EXPR.

More commonly, however, optional values are initialized through assignment:

  • Assigning an instance of TYPE to an optional<TYPE> sets it to the instance’s value.

  • Assigning Null to an optional<TYPE> unsets it.

To check whether an optional value is set, it can implicitly or explicitly be converted to a bool.

global x: optional<uint64>;  # Unset.
global b1: bool = x;         # False.
global b2 = cast<bool>(x);   # False.

if ( x )
    print "'x' was set";     # Never runs.
if ( ! x )
    print "'x' was unset";   # Always runs.

Operators

*optional <dereferenced type>

Returns the element stored, or throws an exception if none.

5.2.5.12. Port

Ports represent the combination of a numerical port number and an associated transport-layer protocol.

Type

  • port

Constants

  • 443/tcp, 53/udp

  • port(PORT, PROTOCOL) creates a port where PORT is a port number and PROTOCOL a spicy::Protocol.

Methods

protocol() spicy::Protocol

Returns the protocol the port is using (such as UDP or TCP).

Operators

port(uint<16>,spicy::Protocol) port

Creates a port instance.

port == port bool

Compares two port values.

port != port bool

Compares two port values.

5.2.5.13. Real

“Real” values store floating points with double precision.

Type

  • real

Constants

  • 3.14, 10e9, 0x1.921fb78121fb8p+1

This type supports the pack/unpack operators.

Operators

cast<int>(real) int

Converts the value to a signed integer type, accepting any loss of information.

cast<interval>(real) interval

Interprets the value as number of seconds.

cast<time>(real) time

Interprets the value as number of seconds since the UNIX epoch.

cast<uint>(real) uint

Converts the value to an unsigned integer type, accepting any loss of information.

real - real real

Returns the difference between the two values.

real -= real real

Subtracts the second value from the first, assigning the new value.

real / real real

Divides the first value by the second.

real /= real real

Divides the first value by the second, assigning the new value.

real == real bool

Compares the two reals.

real > real bool

Compares the two reals.

real >= real bool

Compares the two reals.

real < real bool

Compares the two reals.

real <= real bool

Compares the two reals.

real % real real

Computes the modulus of the first real divided by the second.

real * real real

Multiplies the first real by the second.

real *= real real

Multiplies the first value by the second, assigning the new value.

real ** real real

Computes the first real raised to the power of the second.

-real real

Inverts the sign of the real.

real + real real

Returns the sum of the reals.

real += real real

Adds the first real to the second, assigning the new value.

real != real bool

Compares the two reals.

5.2.5.14. Regular Expression

Spicy provides POSIX-style regular expressions.

Type

  • regexp

Constants

  • /Foo*bar?/, /X(..)(..)(..)Y/

Regular expressions use the extended POSIX syntax, with a few smaller differences and extensions:

  • Supported character classes are: [:lower:], [:upper:], [:digit:], [:blank:].

  • \b asserts a word-boundary, \B matches asserts no word boundary.

  • \xXX matches a byte with the binary hex value XX (e.g., \xff matches a byte of decimal value 255).

  • {#<number>} associates a numerical ID with a regular expression (useful for set matching).

Regular expression constants support two optional attributes:

&anchor

Implicitly anchor the expression, meaning it must match at the beginning of the data.

&nosub

Compile without support for capturing subexpressions, which makes matching more efficient.

Methods

find(data: bytes) tuple<int<32>, bytes>

Searches the regular expression in data and returns the matching part. Different from match, this does not anchor the expression to the beginning of the data: it will find matches at arbitrary starting positions. Returns a 2-tuple with (1) an integer match indicator with the same semantics as that returned by find; and (2) if a match has been found, the data that matches the regular expression. (Note: Currently this function has a runtime that’s quadratic in the size of data; consider using match if performance is an issue.)

match(data: bytes) int<32>

Matches the regular expression against data. If it matches, returns an integer that’s greater than zero. If multiple patterns have been compiled for parallel matching, that integer will be the ID of the matching pattern. Returns -1 if the regular expression does not match the data, but could still yield a match if more data were added. Returns 0 if the regular expression is not found and adding more data wouldn’t change anything. The expression is considered anchored, as though it starts with an implicit ^ regexp operator, to the beginning of the data.

match_groups(data: bytes) vector<bytes>

Matches the regular expression against data. If it matches, returns a vector with one entry for each capture group defined by the regular expression; starting at index 1. Each of these entries is a view locating the matching bytes. In addition, index 0 always contains the data that matches the full regular expression. Returns an empty vector if the expression is not found. The expression is considered anchored, as though it starts with an implicit ^ regexp operator, to the beginning of the data. This method is not compatible with pattern sets and will throw a runtime exception if used with a regular expression compiled from a set.

token_matcher() spicy::MatchState

Initializes state for matching regular expression incrementally against chunks of future input. The expression is considered anchored, as though it starts with an implicit ^ regexp operator, to the beginning of the data.

5.2.5.15. Set

Sets are containers for unique elements with fast lookup. They provide iterators to traverse their content, with no particular ordering.

Types

  • set<T> specifies a set with unique elements of type T.

  • iterator<set<T>>

Constants

  • set(E_1, E_2, ..., E_N) creates a set of N elements. The values E_I must all have the same type. set() creates an empty set of unknown element type; this cannot be used directly but must be coerced into a fully-defined set type first.

  • set<T>(E_1, E_2, ..., E_N) creates a set of type T, initializing it with the elements E_I. set<T>() creates an empty set.

Methods

clear() void

Removes all elements from the set.

Operators

add set[element] void

Adds an element to the set.

begin(<container>) <iterator>

Returns an iterator to the beginning of the container’s content.

delete set[element] void

Removes an element from the set.

end(<container>) <iterator>

Returns an iterator to the end of the container’s content.

set == set bool

Compares two sets element-wise.

<any> in set bool

Returns true if an element is part of the set.

<any> !in set bool

Performs the inverse of the corresponding in operation.

|set| uint<64>

Returns the number of elements a set contains.

set != set bool

Compares two sets element-wise.

Iterator Operators

*iterator<set> <dereferenced type>

Returns the set element that the iterator refers to.

iterator<set> == iterator<set> bool

Returns true if two sets iterators refer to the same location.

iterator<set>++ iterator<set>

Advances the iterator by one set element, returning the previous position.

++iterator<set> iterator<set>

Advances the iterator by one set element, returning the new position.

iterator<set> != iterator<set> bool

Returns true if two sets iterators refer to different locations.

5.2.5.16. Sink

Sinks act as a connector between two units, facilitating feeding the output of one as input into the other. See Sinks for a full description.

Sinks are special in that they don’t represent a type that’s generally available for instantiation. Instead they need to be declared as the member of unit using the special sink keyword. You can, however, maintain references to sinks by assigning the unit member to a variable of type Sink&.

Methods

close() void

Closes a sink by disconnecting all parsing units. Afterwards the sink’s state is as if it had just been created (so new units can be connected). Note that a sink is automatically closed when the unit it is part of is done parsing. Also note that a previously connected parsing unit can not be reconnected; trying to do so will still throw a UnitAlreadyConnected exception.

connect(inout u: strong_ref<unit>) void

Connects a parsing unit to a sink. All subsequent write operations to the sink will pass their data on to this parsing unit. Each unit can only be connected to a single sink. If the unit is already connected, a UnitAlreadyConnected exception is thrown. However, a sink can have more than one unit connected to it.

connect_filter(inout filter: strong_ref<unit>) void

Connects a filter unit to the sink that will transform its input transparently before forwarding it for parsing to other connected units.

Multiple filters can be added to a sink, in which case they will be chained into a pipeline and the data will be passed through them in the order they have been added. The parsing will then be carried out on the output of the last filter in the chain.

Filters must be added before the first data chunk is written into the sink. If data has already been written when a filter is added, an error is triggered.

connect_mime_type(mt: bytes) void

Connects parsing units to a sink for all parsers that support a given MIME type. All subsequent write operations to the sink will pass their data on to these parsing units. The MIME type may have wildcards for type or subtype, and the method will then connect units for all matching parsers.

connect_mime_type(mt: string) void

Connects parsing units to a sink for all parsers that support a given MIME type. All subsequent write operations to the sink will pass their data on to these parsing units. The MIME type may have wildcards for type or subtype, and the method will then connect units for all matching parsers.

gap(seq: uint<64>, len: uint<64>) void

Reports a gap in the input stream. seq is the sequence number of the first byte missing, len is the length of the gap.

sequence_number() uint<64>

Returns the current sequence number of the sink’s input stream, which is one beyond the index of the last byte that has been put in order and delivered so far.

set_auto_trim(enable: bool) void

Enables or disables auto-trimming. If enabled (which is the default) sink input data is trimmed automatically once in-order and processed. See trim() for more information about trimming.

set_initial_sequence_number(seq: uint<64>) void

Sets the sink’s initial sequence number. All sequence numbers given to other methods are then assumed to be absolute numbers beyond that initial number. If the initial number is not set, the sink implicitly uses zero instead.

set_policy(policy: spicy::ReassemblerPolicy) void

Sets a sink’s reassembly policy for ambiguous input. As long as data hasn’t been trimmed, a sink will detect overlapping chunks. This policy decides how to handle ambiguous overlaps. The default (and currently only) policy is ReassemblerPolicy::First, which resolves ambiguities by taking the data from the chunk that came first.

skip(seq: uint<64>) void

Skips ahead in the input stream. seq is the sequence number where to continue parsing. If there’s still data buffered before that position it will be ignored; if auto-skip is also active, it will be immediately deleted as well. If new data is passed in later that comes before seq, that will likewise be ignored. If the input stream is currently stuck inside a gap, and seq lies beyond that gap, the stream will resume processing at seq.

trim(seq: uint<64>) void

Deletes all data that’s still buffered internally up to seq. If processing the input stream hasn’t reached seq yet, parsing will also skip ahead to seq.

Trimming the input stream releases the memory, but that means that the sink won’t be able to detect any further data mismatches.

Note that by default, auto-trimming is enabled, which means all data is trimmed automatically once in-order and processed.

write(data: bytes, [ seq: uint<64> ], [ len: uint<64> ]) void

Passes data on to all connected parsing units. Multiple write calls act like passing input in incrementally: The units will parse the pieces as if they were a single stream of data. If no sequence number seq is provided, the data is assumed to represent a chunk to be appended to the current end of the input stream. If a sequence number is provided, out-of-order data will be buffered and reassembled before being passed on. If len is provided, the data is assumed to represent that many bytes inside the sequence space; if not provided, len defaults to the length of data.

If no units are connected, the call does not have any effect. If multiple units are connected and one parsing unit throws an exception, parsing of subsequent units does not proceed. Note that the order in which the data is parsed to each unit is undefined.

Todo

The error semantics for multiple units aren’t great.

Operators

|sink| uint<64>

Returns the number of bytes written into the sink so far. If the sink has filters attached, this returns the value after filtering.

|strong_ref<sink>| uint<64>

Returns the number of bytes written into the referenced sink so far. If the sink has filters attached, this returns the value after filtering.

Sinks provide a set of dedicated unit hooks as callbacks for the reassembly process. These must be implemented on the reader side, i.e., the unit that’s connected to a sink.

%on_gap(seq: uint64, len: uint64)
%on_overlap(seq: uint64, old: data, new: data)

Triggered when reassembly encounters a 2nd version of data for sequence space already covered earlier. seq is the start of the overlap, and old/new the previous and the new data, respectively. This hook is just for informational purposes, the policy set with set_policy() determines how the reassembler handles the overlap.

%on_skipped(seq: uint64)

Any time skip() moves ahead in the input stream, this hook reports the new sequence number seq.

%on_skipped(seq: uint64, data: bytes)

If data still buffered is skipped over through skip(), it will be passed to this hook, before adjusting the current position. seq is the starting sequence number of the data, data is the data itself.

5.2.5.17. Stream

A stream is data structure that efficiently represents a potentially large, incrementally provided input stream of raw data. You can think of it as a bytes type that’s optimized for (1) efficiently appending new chunks of data at the end, and (2) trimming data no longer needed at the beginning. Other than those two operation, stream data cannot be modified; there’s no way to change the actual content of a stream once it has been added to it. Streams provide iterators for traversal, and views for limiting visibility to smaller windows into the total stream.

Streams are key to Spicy’s parsing process, although most of that happens behind the scenes. You will most likely encounter them when using Random access. They may also be useful for buffering larger volumes of data during processing.

Types

  • stream

  • iterator<stream>

  • view<stream>

Methods

at(i: uint<64>) iterator<stream>

Returns an iterator representing the offset i inside the stream value.

freeze() void

Freezes the stream value. Once frozen, one cannot append any more data to a frozen stream value (unless it gets unfrozen first). If the value is already frozen, the operation does not change anything.

is_frozen() bool

Returns true if the stream value has been frozen.

trim(i: iterator<stream>) void

Trims the stream value by removing all data from its beginning up to (but not including) the position i. The iterator i will remain valid afterwards and will still point to the same location, which will now be the beginning of the stream’s value. All existing iterators pointing to i or beyond will remain valid and keep their offsets as well. The effect of this operation is undefined if i does not actually refer to a location inside the stream value. Trimming is permitted even on frozen values.

unfreeze() void

Unfreezes the stream value. A unfrozen stream value can be further modified. If the value is already unfrozen (which is the default), the operation does not change anything.

Operators

begin(<container>) <iterator>

Returns an iterator to the beginning of the container’s content.

stream(bytes) stream

Creates a stream instance pre-initialized with the given data.

end(<container>) <iterator>

Returns an iterator to the end of the container’s content.

|stream| uint<64>

Returns the number of stream the value contains.

stream += bytes stream

Concatenates data to the stream.

stream += view<stream> stream

Concatenates another stream’s view to the target stream.

stream != stream bool

Compares two stream values lexicographically.

Iterator Methods

is_frozen() bool

Returns whether the stream value that the iterator refers to has been frozen.

offset() uint<64>

Returns the offset of the byte that the iterator refers to relative to the beginning of the underlying stream value.

Iterator Operators

*iterator<stream> uint<64>

Returns the character the iterator is pointing to.

iterator<stream> - iterator<stream> int<64>

Returns the number of stream between the two iterators. The result will be negative if the second iterator points to a location before the first. The result is undefined if the iterators do not refer to the same stream instance.

iterator<stream> == iterator<stream> bool

Compares the two positions. The result is undefined if they are not referring to the same stream value.

iterator<stream> > iterator<stream> bool

Compares the two positions. The result is undefined if they are not referring to the same stream value.

iterator<stream> >= iterator<stream> bool

Compares the two positions. The result is undefined if they are not referring to the same stream value.

iterator<stream>++ iterator<stream>

Advances the iterator by one byte, returning the previous position.

++iterator<stream> iterator<stream>

Advances the iterator by one byte, returning the new position.

iterator<stream> < iterator<stream> bool

Compares the two positions. The result is undefined if they are not referring to the same stream value.

iterator<stream> <= iterator<stream> bool

Compares the two positions. The result is undefined if they are not referring to the same stream value.

iterator<stream> + uint<64> iterator<stream> (commutative)

Advances the iterator by the given number of stream.

iterator<stream> += uint<64> iterator<stream>

Advances the iterator by the given number of stream.

iterator<stream> != iterator<stream> bool

Compares the two positions. The result is undefined if they are not referring to the same stream value.

View Methods

advance(i: iterator<stream>) view<stream>

Advances the view’s starting position to a given iterator i, returning the new view. The iterator must be referring to the same stream values as the view, and it must be equal or ahead of the view’s starting position.

advance(i: uint<64>) view<stream>

Advances the view’s starting position by i stream, returning the new view.

advance_to_next_data() view<stream>

Advances the view’s starting position to the next non-gap position. This always advances the input by at least one byte.

at(i: uint<64>) iterator<stream>

Returns an iterator representing the offset i inside the view.

find(needle: bytes) tuple<bool, iterator<stream>>

Searches needle inside the view’s content. Returns a tuple of a boolean and an iterator. If needle was found, the boolean will be true and the iterator will point to its first occurrence. If needle was not found, the boolean will be false and the iterator will point to the last position so that everything before that is guaranteed to not contain even a partial match of needle (in other words: one can trim until that position and then restart the search from there if more data gets appended to the underlying stream value). Note that for a simple yes/no result, you should use the in operator instead of this method, as it’s more efficient.

limit(i: uint<64>) view<stream>

Returns a new view that keeps the current start but cuts off the end i characters from that beginning. The returned view will not be able to expand any further.

offset() uint<64>

Returns the offset of the view’s starting position within the associated stream value.

starts_with(b: bytes) bool

Returns true if the view starts with b.

sub(begin: iterator<stream>, end: iterator<stream>) view<stream>

Returns a new view of the subsequence from begin up to (but not including) end.

sub(begin: uint<64>, end: uint<64>) view<stream>

Returns a new view of the subsequence from offset begin to (but not including) offset end. The offsets are relative to the beginning of the view.

sub(end: iterator<stream>) view<stream>

Returns a new view of the subsequence from the beginning of the stream up to (but not including) end.

View Operators

view<stream> == bytes bool (commutative)

Compares a stream view and a bytes instance lexicographically.

view<stream> == view<stream> bool

Compares the views lexicographically.

bytes in view<stream> bool

Returns true if the right-hand-side bytes contains the left-hand-side view as a subsequence.

view<stream> in bytes bool

Returns true if the right-hand-side view contains the left-hand-side bytes as a subsequence.

bytes !in view<stream> bool

Performs the inverse of the corresponding in operation.

view<stream> !in bytes bool

Performs the inverse of the corresponding in operation.

|view<stream>| uint<64>

Returns the number of stream the view contains.

view<stream> != bytes bool (commutative)

Compares a stream view and a bytes instance lexicographically.

view<stream> != view<stream> bool

Compares two views lexicographically.

5.2.5.18. String

Strings store readable text that’s associated with a given character set. Internally, Spicy stores them as UTF-8.

Type

  • string

Constants

  • "Spicy", ""

  • When specifying string constants, Spicy assumes them to be in UTF-8.

Methods

encode([ charset: spicy::Charset = hilti::Charset::UTF8 ]) bytes

Converts the string into a binary representation encoded with the given character set.

Operators

string == string bool

Compares two strings lexicographically.

string % <any> string

Renders a printf-style format string.

|string| uint<64>

Returns the number of characters the string contains.

string + string string

Returns the concatenation of two strings.

string += string string

Appends the second string to the first.

string != string bool

Compares two strings lexicographically.

5.2.5.19. Struct

A struct is a heterogeneous container of an ordered set of named values similar to a Tuple. In contrast to tuple elements, struct fields are mutable.

Type

  • struct { IDENTIFIER_1: TYPE_1; ...; IDENTIFIER_N: TYPE_N;  }

Constants

  • Structs can be initialized with a struct initializer, local my_struct: MyStruct = [$FIELD_1 = X_1, ..., $FIELD_N = X_N] where FIELD_I is the label of the corresponding field in MyStruct’s type.

Operators

<struct> ?. <field> bool

Returns true if the struct’s field has a value assigned (not counting any &default).

<struct> . <field> <field type>

Retrieves the value of a struct’s field. If the field does not have a value assigned, it returns its &default expression if that has been defined; otherwise it triggers an exception.

<struct> .? <field> <field type>

Retrieves the value of a struct’s field. If the field does not have a value assigned, it returns its &default expression if that has been defined; otherwise it signals a special non-error exception to the host application (which will normally still lead to aborting execution, similar to the standard dereference operator, unless the host application specifically handles this exception differently).

unset <struct>.<field> void

Clears an optional field.

5.2.5.20. Time

A time value refers to a specific, absolute point of time, specified as the interval from January 1, 1970 UT ( i.e., the Unix epoch). Times are stored with nanosecond resolution, which is retained across all calculations.

Type

  • time

Constants

  • time(SECS) creates a time from an unsigned integer or real value SECS specifying seconds since the epoch.

  • time_ns(NSECS) creates a time from an unsigned integer value NSECS specifying nanoseconds since the epoch.

Methods

nanoseconds() uint<64>

Returns the time as an integer value representing nanoseconds since the UNIX epoch.

seconds() real

Returns the time as a real value representing seconds since the UNIX epoch.

Operators

time(int) time

Creates an time interpreting the argument as number of seconds.

time(real) time

Creates an time interpreting the argument as number of seconds.

time(uint) time

Creates an time interpreting the argument as number of seconds.

time_ns(int) time

Creates an time interpreting the argument as number of nanoseconds.

time_ns(uint) time

Creates an time interpreting the argument as number of nanoseconds.

time - time interval

Returns the difference of the times.

time - interval time

Subtracts the interval from the time.

time == time bool

Compares two time values.

time > time bool

Compares the times.

time >= time bool

Compares the times.

time < time bool

Compares the times.

time <= time bool

Compares the times.

time + interval time (commutative)

Adds the interval to the time.

time != time bool

Compares two time values.

5.2.5.21. Tuple

Tuples are heterogeneous containers of a fixed, ordered set of types. Tuple elements may optionally be declared and addressed with custom identifier names. Tuple elements are immutable.

Type

  • tuple<[IDENTIFIER_1: ]TYPE_1, ...[IDENTIFIER_N: ]TYPE_N>

Constants

  • (1, "string", True), (1, ), ()

  • tuple(1, "string", True), tuple(1), tuple()

Operators

(x,...,y)=tuple <tuple>

Assigns element-wise to the left-hand-side tuple.

tuple == tuple bool

Compares two tuples element-wise.

tuple[uint<64>] <type of element>

Extracts the tuple element at the given index. The index must be a constant unsigned integer.

tuple . <id> <type of element>

Extracts the tuple element corresponding to the given ID.

tuple != tuple bool

Compares two tuples element-wise.

5.2.5.22. Unit

Type

  • unit { FIELD_1; ...; FIELD_N }

  • See Parsing for a full discussion of unit types.

Constants

  • Spicy doesn’t support unit constants, but you can initialize unit instances through coercion from a struct initializer, see Struct.

    Todo

    This initialization isn’t actually available in Spicy yet (#1036).

Methods

backtrack() void

Aborts parsing at the current position and returns back to the most recent &try attribute. Turns into a parse error if there’s no &try in scope.

connect_filter(inout filter: strong_ref<unit>) void

Connects a separate filter unit to transform the unit’s input transparently before parsing. The filter unit will see the original input, and this unit will receive everything the filter passes on through forward().

Filters can be connected only before a unit’s parsing begins. The latest possible point is from inside the target unit’s %init hook.

context() <context type>&

Returns a reference to the %context instance associated with the unit.

find(needle: bytes, [ dir: spicy::Direction ], [ start: iterator<stream> ]) optional<iterator<stream>>

Searches a needle pattern inside the input region defined by where the unit began parsing and its current parsing position. If executed from inside a field hook, the current parasing position will represent the first byte that the field has been parsed from. By default, the search will start at the beginning of that region and scan forward. If the direction is spicy::Direcction::Backward, the search will start at the end of the region and scan backward. In either case, a starting position can also be explicitly given, but must lie inside the same region.

forward(data: bytes) void

If the unit is connected as a filter to another one, this method forwards transformed input over to that other one to parse. If the unit is not connected, this method will silently discard the data.

forward_eod() void

If the unit is connected as a filter to another one, this method signals that other one that end of its input has been reached. If the unit is not connected, this method will not do anything.

input() iterator<stream>

Returns an iterator referring to the input location where the current unit has begun parsing. If this method is called before the units parsing has begun, it will throw a runtime exception. Once available, the input position will remain accessible for the unit’s entire life time.

offset() uint<64>

Returns the offset of the current location in the input stream relative to the unit’s start. If executed from inside a field hook, the offset will represent the first byte that the field has been parsed from.

position() iterator<stream>

Returns an iterator to the current position in the unit’s input stream. If executed from inside a field hook, the position will represent the first byte that the field has been parsed from.

set_input(i: iterator<stream>) void

Moves the current parsing position to i. The iterator i must be into the input of the current unit, or the method will throw a runtime exception.

Operators

<unit> ?. <field> <field type>

Returns true if the unit’s field has a value assigned (not counting any &default).

<unit> . <field> <field type>

Retrieves the value of a unit’s field. If the field does not have a value assigned, it returns its &default expression if that has been defined; otherwise it triggers an exception.

<unit> .? <field> <field type>

Retrieves the value of a unit’s field. If the field does not have a value assigned, it returns its &default expression if that has been defined; otherwise it signals a special non-error exception to the host application (which will normally still lead to aborting execution, similar to the standard dereference operator, unless the host application specifically handles this exception differently).

unset unit.<field> void

Clears an optional field.

5.2.5.23. Vector

Vectors are homogeneous containers, holding a set of elements of a given element type. They provide iterators to traverse their content.

Types

  • vector<T> specifies a vector with elements of type T.

  • iterator<vector<T>>

Constants

  • vector(E_1, E_2, ..., E_N) creates a vector of N elements. The values E_I must all have the same type. vector() creates an empty vector of unknown element type; this cannot be used directly but must be coerced into a fully-defined vector type first.

  • vector<T>(E_1, E_2, ..., E_N) creates a vector of type T, initializing it with the N elements E_I. vector<T>() creates an empty vector.

  • Vectors can be initialized through coercion from a list value: vector<string> I = ["A", "B", "C"].

Methods

assign(i: uint<64>, x: <any>) void

Assigns x to the i*th element of the vector. If the vector contains less than *i elements a sufficient number of default-initialized elements is added to carry out the assignment.

at(i: uint<64>) <iterator>

Returns an iterator referring to the element at vector index i.

back() <type of element>

Returns the last element of the vector. It throws an exception if the vector is empty.

front() <type of element>

Returns the first element of the vector. It throws an exception if the vector is empty.

pop_back() void

Removes the last element from the vector, which must be non-empty.

push_back(x: <any>) void

Appends x to the end of the vector.

reserve(n: uint<64>) void

Reserves space for at least n elements. This operation does not change the vector in any observable way but provides a hint about the size that will be needed.

resize(n: uint<64>) void

Resizes the vector to hold exactly n elements. If n is larger than the current size, the new slots are filled with default values. If n is smaller than the current size, the excessive elements are removed.

sub(begin: uint<64>, end: uint<64>) vector

Extracts a subsequence of vector elements spanning from index begin to (but not including) index end.

sub(end: uint<64>) vector

Extracts a subsequence of vector elements spanning from index begin to (but not including) index end.

Operators

begin(<container>) <iterator>

Returns an iterator to the beginning of the container’s content.

end(<container>) <iterator>

Returns an iterator to the end of the container’s content.

vector == vector bool

Compares two vectors element-wise.

vector[uint<64>] <type of element>

Returns the vector element at the given index.

|vector| uint<64>

Returns the number of elements a vector contains.

vector + vector vector

Returns the concatenation of two vectors.

vector += vector vector

Concatenates another vector to the vector.

vector != vector bool

Compares two vectors element-wise.

Iterator Operators

*iterator<vector> <dereferenced type>

Returns the vector element that the iterator refers to.

iterator<vector> == iterator<vector> bool

Returns true if two vector iterators refer to the same location.

iterator<vector>++ iterator<vector>

Advances the iterator by one vector element, returning the previous position.

++iterator<vector> iterator<vector>

Advances the iterator by one vector element, returning the new position.

iterator<vector> != iterator<vector> bool

Returns true if two vector iterators refer to different locations.

5.2.5.24. Void

The void type is place holder for specifying “no type”, such as when a function doesn’t return anything.

Type

  • void