Storage Framework
The storage framework provides a plugin-based system for short- and long-term storage of data, accessible from Zeek script-land. This is not packet data itself, but data artifacts generated from the packet data. It has interchangeable asynchronous and synchronous modes. The framework provides just a simple key-value store, using Zeek values as the keys to store and lookup data.
This chapter gives an overview of the storage framework, plus examples of using it. For
more examples, see the test cases in testing/btest/scripts/base/frameworks/storage
and
an example storage plugin in testing/btest/plugin/storage-src
.
Terminology
Zeek’s storage framework uses two main components:
- Backend
A backend plugin provides access to a storage system. Backends can be network-based storage systems such as Redis, on-disk database systems such as SQLite, etc. Backend plugins can define script-level records for configuring them when they’re opened. Zeek provides backends for Redis and SQLite by default, but others may be implemented as external packages.
- Serializer
A serializer plugin provides a mechanism for converting data from Zeek scripts into formats that backends can use. Serializers are intended to be agnostic to backends. They convert between Zeek values and opaque byte buffers, and backends should be able to handle the result of any individual serializer. Zeek provides a JSON serializer by default, but others may be implemented as external packages.
Asynchronous Mode vs Synchronous Mode
Storage backends support both asynchronous and synchronous modes. The difference between
using the two modes is that asynchronous calls must be used as part of when
statements, whereas synchronous calls can be used either with when
statements or
called directly. Synchronous functions will block until the backend returns
data. Otherwise, all of the arguments and return values are the same between them. They
are split between two script-level modules: Storage::Async
loaded from
base/frameworks/storage/async
and Storage::Sync
loaded from
base/frameworks/storage/sync
.
When reading pcap data via the -r
Zeek argument, all backends operate in a synchronous
manner internally to ensure that Zeek’s timers run correctly. Regardless of this behavior,
asynchronous functions are required to be used with the when
statement, but they’ll
essentially be translated to synchronous calls.
Using the Storage Framework
All of the examples below use the SQLite backend. Usage of other backends follows the same
model. Switching the examples to a different backend involves only using a different tag
and options record with the Storage::Async::open_backend
/
Storage::Sync::open_backend
functions.
Operation Return Values
All backend methods return a record of type Storage::OperationResult
. This
record contains a code that indicates the result of the operation. For failures, backends
may provide more details in the optional error message. The record will also contain data
for operations that return values, namely open_backend
or get
.
Storage::ReturnCode
contains all of the codes that can be returned from the
various operations. Not all codes are valid for all operations.
Storage::ReturnCode
can be redefined by backends to add new backend-specific
statuscodes as needed.
Opening and Closing a Backend
Opening a backend starts with defining a set of options for that backend. The
Storage::BackendOptions
is defined with some fields by default, but loading a
policy for a specific backend type may add new fields to it. In the example below, we
loaded the SQLite policy, which adds a new sqlite
field with additional options. These
options are filled in to denote where to store the sqlite database file and what table to
use. This allows users to separate different instances of a backend from each other in a
single database file.
The script then sets a serializer. The storage framework sets this to the JSON
(Storage::STORAGE_SERIALIZER_JSON
) serializer by default, but setting it
explicitly is included below as an example.
Calling Storage::Sync::open_backend
instantiates a backend connection. As
described above, open_backend
returns a Storage::OperationResult
. On
success, it stores the handle to the backend in the value
field of the result
record. We check the code
field as well to make sure the operation succeeded. Backend
handles can be stored in global values just like any other value. They can be opened
during startup, such as in a zeek_init
event handler, and reused throughout
the runtime of Zeek. When a backend is successfully opened, a
Storage::backend_opened
event will be emitted.
The two type arguments to open_backend
define the script-level types for keys and
values. Attempting to use other types with the backend results in
Storage::KEY_TYPE_MISMATCH
errors.
Lastly, we call Storage::Sync::close_backend
to close the backend before
exiting. When a backend is successfully closed, a Storage::backend_lost
event
will be emitted.
@load base/frameworks/storage/sync
@load policy/frameworks/storage/backend/sqlite
local backend_opts: Storage::BackendOptions;
local backend: Storage::BackendHandle;
# Loading the sqlite policy adds this field to the options record.
opts$sqlite = [$database_path="test.sqlite", $table_name="testing"];
# This is the default, but is shown here for how to set it.
opts$serializer = Storage::STORAGE_SERIALIZER_JSON;
local res = Storage::Sync::open_backend(Storage::STORAGE_BACKEND_SQLITE, opts, string, string);
if ( res$code == Storage::SUCCESS )
backend = res$value;
res = Storage::Sync::close_backend(backend);
Storing, Retrieving, and Erasing Data
The true point of the storage framework is to store and retrieve data. This example shows
making synchronous calls to add a new key/value pair to a backend, retrieve it, and erase
the entry associated with the key. This assumes that the backend
variable used below
points to an opened backend handle. The idea is that users do not need to worry about the
underlying backend implementation. In terms of Zeek’s script-layer API, SQLite, Redis, or
other backends should behave identically.
First, we make a call to Storage::Sync::put
, passing a key and a value to be
stored. These must be of the same types that were passed in the arguments to
open_backend
, as described in the earlier section.
The arguments passed into put
are contained in a record of type
Storage::PutArgs
. See the documentation for that type for descriptions of the
fields available. In this case, we specify a key and a value plus an expiration time. This
expiration time indicates when the data should be automatically removed from the
backend. We check the result value, and print the error string and return if the operation
failed.
Next, we attempt to retrieve the same key from the backend. Assuming that the key hasn’t
been erased, either manually or via expiration, the value is returned in the value
field of the result record. If the key has been removed already, the backend should return
a Storage::KEY_NOT_FOUND
code.
Finally, we manually attempt to erase the key. This will remove the key/value pair from
the store, assuming that it hasn’t already been removed manually or via expiration. Same
as with get
, Storage::KEY_NOT_FOUND
should be returned if the key doesn’t
exist.
local res = Storage::Sync::put(backend, [$key="abc", $value="def", $expire_time=45sec]);
if ( res$code != Storage::SUCCESS )
{
print(res$error_str);
return;
}
res = Storage::Sync::get(backend, "abc");
if ( res$code != Storage::SUCCESS )
{
print(res$error_str);
return;
}
res = Storage:Sync::erase(backend, "abc");
if ( res$code != Storage::SUCCESS )
{
print(res$error_str);
return;
}
Events
Two events exist for the storage framework: Storage::backend_lost
and
Storage::backend_opened
. Both events were mentioned in the example of
opening and closing a backend, but an additional point needs to
be made about the Storage::backend_lost
event. This event is also raised when
a connection is lost unexpectedly. This gives users information about connection failures,
as well an opportunity to handle those failures by reconnecting.