Storage Framework

The storage framework provides a plugin-based system for short- and long-term storage of data, accessible from Zeek script-land. This is not packet data itself, but data artifacts generated from the packet data. It has interchangeable asynchronous and synchronous modes. The framework provides just a simple key-value store, using Zeek values as the keys to store and lookup data.

This chapter gives an overview of the storage framework, plus examples of using it. For more examples, see the test cases in testing/btest/scripts/base/frameworks/storage and an example storage plugin in testing/btest/plugin/storage-src.

Terminology

Zeek’s storage framework uses two main components:

Backend

A backend plugin provides access to a storage system. Backends can be network-based storage systems such as Redis, on-disk database systems such as SQLite, etc. Backend plugins can define script-level records for configuring them when they’re opened. Zeek provides backends for Redis and SQLite by default, but others may be implemented as external packages.

Serializer

A serializer plugin provides a mechanism for converting data from Zeek scripts into formats that backends can use. Serializers are intended to be agnostic to backends. They convert between Zeek values and opaque byte buffers, and backends should be able to handle the result of any individual serializer. Zeek provides a JSON serializer by default, but others may be implemented as external packages.

Asynchronous Mode vs Synchronous Mode

Storage backends support both asynchronous and synchronous modes. The difference between using the two modes is that asynchronous calls must be used as part of when statements, whereas synchronous calls can be used either with when statements or called directly. Synchronous functions will block until the backend returns data. Otherwise, all of the arguments and return values are the same between them. They are split between two script-level modules: Storage::Async loaded from base/frameworks/storage/async and Storage::Sync loaded from base/frameworks/storage/sync.

When reading pcap data via the -r Zeek argument, all backends operate in a synchronous manner internally to ensure that Zeek’s timers run correctly. Regardless of this behavior, asynchronous functions are required to be used with the when statement, but they’ll essentially be translated to synchronous calls.

Using the Storage Framework

All of the examples below use the SQLite backend. Usage of other backends follows the same model. Switching the examples to a different backend involves only using a different tag and options record with the Storage::Async::open_backend/ Storage::Sync::open_backend functions.

Operation Return Values

All backend methods return a record of type Storage::OperationResult. This record contains a code that indicates the result of the operation. For failures, backends may provide more details in the optional error message. The record will also contain data for operations that return values, namely open_backend or get. Storage::ReturnCode contains all of the codes that can be returned from the various operations. Not all codes are valid for all operations. Storage::ReturnCode can be redefined by backends to add new backend-specific statuscodes as needed.

Opening and Closing a Backend

Opening a backend starts with defining a set of options for that backend. The Storage::BackendOptions is defined with some fields by default, but loading a policy for a specific backend type may add new fields to it. In the example below, we loaded the SQLite policy, which adds a new sqlite field with additional options. These options are filled in to denote where to store the sqlite database file and what table to use. This allows users to separate different instances of a backend from each other in a single database file.

The script then sets a serializer. The storage framework sets this to the JSON (Storage::STORAGE_SERIALIZER_JSON) serializer by default, but setting it explicitly is included below as an example.

Calling Storage::Sync::open_backend instantiates a backend connection. As described above, open_backend returns a Storage::OperationResult. On success, it stores the handle to the backend in the value field of the result record. We check the code field as well to make sure the operation succeeded. Backend handles can be stored in global values just like any other value. They can be opened during startup, such as in a zeek_init event handler, and reused throughout the runtime of Zeek. When a backend is successfully opened, a Storage::backend_opened event will be emitted.

The two type arguments to open_backend define the script-level types for keys and values. Attempting to use other types with the backend results in Storage::KEY_TYPE_MISMATCH errors.

Lastly, we call Storage::Sync::close_backend to close the backend before exiting. When a backend is successfully closed, a Storage::backend_lost event will be emitted.

@load base/frameworks/storage/sync
@load policy/frameworks/storage/backend/sqlite

local backend_opts: Storage::BackendOptions;
local backend: Storage::BackendHandle;

# Loading the sqlite policy adds this field to the options record.
opts$sqlite = [$database_path="test.sqlite", $table_name="testing"];

# This is the default, but is shown here for how to set it.
opts$serializer = Storage::STORAGE_SERIALIZER_JSON;

local res = Storage::Sync::open_backend(Storage::STORAGE_BACKEND_SQLITE, opts, string, string);
if ( res$code == Storage::SUCCESS )
  backend = res$value;

res = Storage::Sync::close_backend(backend);

Storing, Retrieving, and Erasing Data

The true point of the storage framework is to store and retrieve data. This example shows making synchronous calls to add a new key/value pair to a backend, retrieve it, and erase the entry associated with the key. This assumes that the backend variable used below points to an opened backend handle. The idea is that users do not need to worry about the underlying backend implementation. In terms of Zeek’s script-layer API, SQLite, Redis, or other backends should behave identically.

First, we make a call to Storage::Sync::put, passing a key and a value to be stored. These must be of the same types that were passed in the arguments to open_backend, as described in the earlier section. The arguments passed into put are contained in a record of type Storage::PutArgs. See the documentation for that type for descriptions of the fields available. In this case, we specify a key and a value plus an expiration time. This expiration time indicates when the data should be automatically removed from the backend. We check the result value, and print the error string and return if the operation failed.

Next, we attempt to retrieve the same key from the backend. Assuming that the key hasn’t been erased, either manually or via expiration, the value is returned in the value field of the result record. If the key has been removed already, the backend should return a Storage::KEY_NOT_FOUND code.

Finally, we manually attempt to erase the key. This will remove the key/value pair from the store, assuming that it hasn’t already been removed manually or via expiration. Same as with get, Storage::KEY_NOT_FOUND should be returned if the key doesn’t exist.

local res = Storage::Sync::put(backend, [$key="abc", $value="def", $expire_time=45sec]);
if ( res$code != Storage::SUCCESS )
  {
  print(res$error_str);
  return;
  }

res = Storage::Sync::get(backend, "abc");
if ( res$code != Storage::SUCCESS )
  {
  print(res$error_str);
  return;
  }

res = Storage:Sync::erase(backend, "abc");
if ( res$code != Storage::SUCCESS )
  {
  print(res$error_str);
  return;
  }

Events

Two events exist for the storage framework: Storage::backend_lost and Storage::backend_opened. Both events were mentioned in the example of opening and closing a backend, but an additional point needs to be made about the Storage::backend_lost event. This event is also raised when a connection is lost unexpectedly. This gives users information about connection failures, as well an opportunity to handle those failures by reconnecting.