Supervisor Framework
The Supervisor framework enables an entirely new mode for Zeek, one that supervises a set of Zeek processes that are meant to be persistent. A Supervisor automatically revives any process that dies or exits prematurely and also arranges for an ordered shutdown of the entire process tree upon its own termination. This Supervisor mode for Zeek provides the basic foundation for process configuration/management that could be used to deploy a Zeek cluster similar to what ZeekControl does, but is also simpler to integrate as a standard system service.
Simple Example
A simple example of using the Supervisor to monitor one Zeek process sniffing packets from an interface looks like the following:
$ zeek -j simple-supervisor.zeek
1event zeek_init()
2 {
3 if ( Supervisor::is_supervisor() )
4 {
5 local sn = Supervisor::NodeConfig($name="foo", $interface="en0");
6 local res = Supervisor::create(sn);
7
8 if ( res == "" )
9 print "supervisor created a new node";
10 else
11 print "supervisor failed to create node", res;
12 }
13 else
14 print fmt("supervised node '%s' zeek_init()", Supervisor::node()$name);
15 }
16
17event zeek_done()
18 {
19 if ( Supervisor::is_supervised() )
20 print fmt("supervised node '%s' zeek_done()", Supervisor::node()$name);
21 else
22 print "supervisor zeek_done()";
23 }
The command-line argument of -j
toggles Zeek to run in “Supervisor mode” to
allow for creation and management of child processes. If you’re going to test
this locally, be sure to change en0
to a real interface name you can sniff.
Notice that the simple-supervisor.zeek
script is loaded and executed by
both the main Supervisor process and also the child Zeek process that it spawns
via Supervisor::create
with Supervisor::is_supervisor
or Supervisor::is_supervised
being able to distinguish the
Supervisor process from the supervised child process, respectively.
You can also distinguish between multiple supervised child processes by
inspecting the contents of Supervisor::node
(e.g. comparing node
names).
If you happened to be running this locally on an interface with checksum
offloading and want Zeek to ignore checksums, instead simply run with the
-C
command-line argument like:
$ zeek -j -C simple-supervisor.zeek
Most command-line arguments to Zeek are automatically inherited by any
supervised child processes that get created. The notable ones that are not
inherited are the options to read pcap files and live interfaces, -r
and
-i
, respectively.
For node-specific configuration options, see Supervisor::NodeConfig
which gets passed as argument to Supervisor::create
.
Supervised Cluster Example
To run a full Zeek cluster similar to what you may already know, try the following script:
$ zeek -j cluster-supervisor.zeek
1event zeek_init()
2 {
3 if ( ! Supervisor::is_supervisor() )
4 return;
5
6 Broker::listen("127.0.0.1", 9999/tcp);
7
8 local cluster: table[string] of Supervisor::ClusterEndpoint;
9 cluster["manager"] = [$role=Supervisor::MANAGER, $host=127.0.0.1, $p=10000/tcp];
10 cluster["logger"] = [$role=Supervisor::LOGGER, $host=127.0.0.1, $p=10001/tcp];
11 cluster["proxy"] = [$role=Supervisor::PROXY, $host=127.0.0.1, $p=10002/tcp];
12 cluster["worker"] = [$role=Supervisor::WORKER, $host=127.0.0.1, $p=10003/tcp, $interface="en0"];
13
14 for ( n, ep in cluster )
15 {
16 local sn = Supervisor::NodeConfig($name=n);
17 sn$cluster = cluster;
18 sn$directory = n;
19
20 if ( ep?$interface )
21 sn$interface = ep$interface;
22
23 local res = Supervisor::create(sn);
24
25 if ( res != "" )
26 print fmt("supervisor failed to create node '%s': %s", n, res);
27 }
28 }
This script now spawns four nodes: a cluster manager, logger, worker, and proxy. It also configures each node to use a separate working directory corresponding to the node’s name within the current working directory of the Supervisor process. Any stdout/stderr output of the nodes is automatically redirected through the Supervisor process and prefixes with relevant information, like the node name that the output came from.
The Supervisor process also listens on a port of its own for further
instructions from other external/remote processes via
Broker::listen
. For example, you could use this other script to
tell the Supervisor to restart all processes, perhaps to re-load Zeek scripts
you’ve changed in the meantime:
$ zeek supervisor-control.zeek
1event zeek_init()
2 {
3 Broker::peer("127.0.0.1", 9999/tcp, 1sec);
4 }
5
6event Broker::peer_added(endpoint: Broker::EndpointInfo, msg: string)
7 {
8 Broker::publish(SupervisorControl::topic_prefix, SupervisorControl::restart_request, "", "");
9 }
10
11event SupervisorControl::restart_response(reqid: string, result: bool)
12 {
13 print fmt("got result of supervisor restart request: %s", result);
14 terminate();
15 }
Any Supervisor instruction you can perform via an API call in a local script can also be triggered via an associated external event.
For further details, consult the Supervisor
API at
base/frameworks/supervisor/api.zeek and
SupervisorControl
API (for remote management) at
base/frameworks/supervisor/control.zeek.
Internal Architecture
The following details aren’t necessarily important for most users, but instead aim to give developers a high-level overview of how the process supervision framework is implemented. The process tree in “supervisor” mode looks like:

The top-level “Supervisor” process does not directly manage any of the supervised nodes that are created. Instead, it spawns in intermediate process, called “Stem”, to manage the lifetime of supervised nodes. This is done for two reasons:
Avoids the need to
exec()
the supervised processes which requires executing whatever version of thezeek
binary happens to exist on the filesystem at the time of call and it may have changed in the meantime. This can help avoid potential incompatibility or race-condition pitfalls associated with system maintenance/upgrades. The one situation that does still require anexec()
is if the Stem process dies prematurely, but that is expected to be a rare scenario.Zeek run-time operation generally taints global state, so creating an early
fork()
for use as the Stem process provides a pure baseline image to use for supervised processes.
Ultimately, there are two tiers of process supervision happening: the Supervisor will revive the Stem process if needed and the Stem process will revive any of its children when needed.
Also, either the Stem or any of its supervised children processes will
automatically detect if they are orphaned from their parent process and
self-terminate. The Stem checks for orphaning simply by waking up every second
from its poll()
loop to look if its parent PID changed. A supervised node
checks for orphaning similarly, but instead does so from a recurring Timer
.
Other than the orphaning-check and how it establishes the desired
configuration from a combination of inheriting command-line arguments and
inspecting Supervisor-specific options, a supervised node does not operate
differently at run-time from a traditional Zeek process.
Node Revival
The Supervisor framework assumes that supervised nodes run until something asks the Supervisor to stop them. When a supervised node exits unexpectedly, the Stem attempts to revive it during its periodic polling routine. This revival procedure implements exponential delay, as follows: starting from a delay of one second, the Stem revives the node up to 3 times. At that point, it doubles the revival delay, and again tries up to 3 times. This continues indefinitely: the Stem never gives up on a node, while the revival delay keeps growing. Once a supervised node has remained up for at least 30 seconds, the revival state clears and will start from scratch as just described, should the node exit again. The Supervisor codebase currently hard-wires these thresholds and delays.