.. _devel-cluster-spawning-cluster: ================== Spawning a Cluster ================== Introduction ------------ A Zeek cluster is a collection of worker, logger and proxy processes as well as a single manager process. See the :ref:`Cluster Architectures ` for a more detailed introduction. This section gives a short and general overview how to run these processes by hand in order to spawn a Zeek cluster. The main ingredients needed to run a Zeek cluster as of Zeek 8.1 are * the ``cluster-layout.zeek`` file * a mechanism to spawn processes For a production setup, you'd also include a monitoring component that restarts any crashed processes, monitors and reports on their health, etc. .. note:: You are reading low-level background information for developers. As a Zeek user or operator, you usually use :ref:`ZeekControl ` or other high-level tools to operate a Zeek cluster. Cluster Layout -------------- All Zeek processes (also called nodes) that are part of a Zeek cluster are given unique names. Conventionally, the name is the type of node (``worker``, ``proxy``, ``logger`` or ``manager``) suffixed with an incrementing number that starts at ``1`` (except for the manager node). It's useful to always use a number suffix even if there's only a single instance of the process in a cluster. Don't use ``0`` padding for the numbering, as it complicates things. When a single Zeek cluster spans multiple hosts or two interfaces are monitored, the naming scheme is conventionally extended to include another incrementing number to produce ``worker-1-1``, ``worker-1-2``, ``worker-2-1`` and ``worker-2-2``, where the first number is the number of the host and the second is the number of the node. There's some flexibility here, however, and any naming scheme can be chosen. These node names are used as keys in the :zeek:see:`Cluster::nodes` table. This table is conventionally populated via :zeek:keyword:`redef` when Zeek loads the ``cluster-layout.zeek`` file that has to be available somewhere in the ``ZEEKPATH``. Note that the :ref:`Supervisor Framework ` implementation uses a custom IPC mechanism to pass this information to the processes instead. We exclude it from the following discussion. Since Zeek 8.1, there's a small utility available called ``zeek-cluster-layout-generator`` that you may use to produce a basic ``cluster-layout.zeek`` file given the number of processes. The following listing shows the output of running this tool for a Zeek cluster with 2 workers, 1 logger, 1 proxy and a manager. As mentioned, the :zeek:keyword:`redef` of the :zeek:see:`Cluster::nodes` variable is the crucial part here. .. code-block:: $ zeek-cluster-layout-generator -W 2 # Auto-generated by zeek-cluster-layout-generator redef Cluster::manager_is_logger = F; redef Cluster::nodes += { ["manager"] = [$node_type=Cluster::MANAGER, $ip=127.0.0.1, $p=27760/tcp, $metrics_port=9991/tcp], ["logger-1"] = [$node_type=Cluster::LOGGER, $ip=127.0.0.1, $p=27761/tcp, $manager="manager", $metrics_port=9992/tcp], ["proxy-1"] = [$node_type=Cluster::PROXY, $ip=127.0.0.1, $p=27762/tcp, $manager="manager", $metrics_port=9993/tcp], ["worker-1"] = [$node_type=Cluster::WORKER, $ip=127.0.0.1, $manager="manager", $metrics_port=9994/tcp], ["worker-2"] = [$node_type=Cluster::WORKER, $ip=127.0.0.1, $manager="manager", $metrics_port=9995/tcp], }; @load base/frameworks/telemetry/options redef Telemetry::metrics_address = "0.0.0.0"; redef Telemetry::metrics_port = Cluster::local_node_metrics_port(); .. note:: With the arrival of the ZeroMQ cluster backend, a number of fields in the cluster layout aren't very important anymore. Indeed, there's ideas and thoughts around `removing the static cluster-layout.zeek file `_ completely. For the time being, however, assume that you need to pre-render the full ``cluster-layout.zeek`` file. Spawning Processes ------------------ Once the ``cluster-layout.zeek`` file has been generated, spawn individual cluster processes as follows: * Set and export the ``ZEEKPATH`` environment variable such that it contains the directory in which the ``cluster-layout.zeek`` file is located, or alternatively copy the generated ``cluster-layout.zeek`` file into the working directory of each node as created below (``.`` is in the default ``ZEEKPATH``). * Create working directories for all processes to be spawned. Conventionally these are named like the node itself, e.g, ``worker-1`` or ``manager`` or ``logger-1``. * Change into the working directory and set the ``CLUSTER_NODE`` environment variable to the name of the cluster process. * Execute the ``zeek`` process, passing arguments as needed. Record its PID. All processes should receive ``local`` by default to load the ``local.zeek`` file. Workers will generally also receive the ``-i `` argument, with the interface possibly prefixed by the packet source plugin to use. On Linux, using AF_PACKET and interface ``eth0``, this would then end up as ``-i af_packet::eth0``. * For pinning processes to CPUs, one common approach is to use the `taskset `_ utility and execute ``zeek`` using it instead. Minimal Shell-Based Supervisor ------------------------------ The following shell script implements steps outlined above. .. warning:: Do not use this script in production! It's solely for documentation and demonstration purposes and contains only the bare minimum to get a Zeek cluster off the ground! .. literalinclude:: ./supervisor.sh :caption: supervisor.sh :language: shell Running this script and outputting the process tree gives the usual Zeek cluster process tree you might be used to from elsewhere. .. code-block:: $ ZEEK_WORKERS=2 ZEEK_INTERFACE=af_packet::lo ./supervisor.sh $ pstree -acT 44168 bash └─supervisor.sh ./supervisor.sh ├─zeek local policy/frameworks/cluster/backend/zeromq ├─zeek local policy/frameworks/cluster/backend/zeromq ├─zeek local policy/frameworks/cluster/backend/zeromq ├─zeek local policy/frameworks/cluster/backend/zeromq ├─zeek -C -i af_packet::lo local policy/frameworks/cluster/backend/zeromq └─zeek -C -i af_packet::lo local policy/frameworks/cluster/backend/zeromq Hopefully this removes some of the magic around what a Zeek cluster is, how it is spawned, etc. If you're now tempted to write systemd service units, take a look at the `zeek-systemd-generator `_ first!