base/frameworks/sumstats/main.zeek
- SumStats
The summary statistics framework provides a way to summarize large streams of data into simple reduced measurements.
- Namespace
SumStats
Summary
Types
Type to represent the calculations that are available. |
|
Represents a thing which is having summarization results collected for it. |
|
Represents data being added for a single observation. |
|
Represents a reducer. |
|
Type to store a table of results for multiple reducers indexed by observation stream identifier. |
|
Type to store a table of sumstats results indexed by keys. |
|
Result calculated for an observation stream fed into a reducer. |
|
Represents a SumStat, which consists of an aggregation of reducers along with mechanisms to handle various situations like the epoch ending or thresholds being crossed. |
Redefinitions
|
Functions
Create a summary statistic. |
|
Helper function to represent a |
|
Manually end the current epoch for a sumstat. |
|
Add data into an observation stream. |
|
Dynamically request a sumstat key. |
Detailed Interface
Types
- SumStats::Calculation
- Type
-
- SumStats::PLACEHOLDER
- SumStats::AVERAGE
(present if base/frameworks/sumstats/plugins/average.zeek is loaded)
Calculate the average of the values.
- SumStats::HLL_UNIQUE
(present if base/frameworks/sumstats/plugins/hll_unique.zeek is loaded)
Calculate the number of unique values.
- SumStats::LAST
(present if base/frameworks/sumstats/plugins/last.zeek is loaded)
Keep last X observations in a queue.
- SumStats::MAX
(present if base/frameworks/sumstats/plugins/max.zeek is loaded)
Find the maximum value.
- SumStats::MIN
(present if base/frameworks/sumstats/plugins/min.zeek is loaded)
Find the minimum value.
- SumStats::SAMPLE
(present if base/frameworks/sumstats/plugins/sample.zeek is loaded)
Get uniquely distributed random samples from the observation stream.
- SumStats::VARIANCE
(present if base/frameworks/sumstats/plugins/variance.zeek is loaded)
Calculate the variance of the values.
- SumStats::STD_DEV
(present if base/frameworks/sumstats/plugins/std-dev.zeek is loaded)
Calculate the standard deviation of the values.
- SumStats::SUM
(present if base/frameworks/sumstats/plugins/sum.zeek is loaded)
Calculate the sum of the values. For string values, this will be the number of strings.
- SumStats::TOPK
(present if base/frameworks/sumstats/plugins/topk.zeek is loaded)
Keep a top-k list of values.
- SumStats::UNIQUE
(present if base/frameworks/sumstats/plugins/unique.zeek is loaded)
Calculate the number of unique values.
Type to represent the calculations that are available. The calculations are all defined as plugins.
- SumStats::Key
- Type
-
- str:
string
&optional
A non-address related summarization or a sub-key for an address based summarization. An example might be successful SSH connections by client IP address where the client string would be the key value. Another example might be number of HTTP requests to a particular value in a Host header. This is an example of a non-host based metric since multiple IP addresses could respond for the same Host header value.
- host:
addr
&optional
Host is the value to which this metric applies.
- str:
Represents a thing which is having summarization results collected for it.
- SumStats::Observation
- Type
Represents data being added for a single observation. Only supply a single field at a time!
- SumStats::Reducer
- Type
-
- stream:
string
Observation stream identifier for the reducer to attach to.
- apply:
set
[SumStats::Calculation
] The calculations to perform on the data points.
- pred:
function
(key:SumStats::Key
, obs:SumStats::Observation
)bool
&optional
A predicate so that you can decide per key if you would like to accept the data being inserted.
- normalize_key:
function
(key:SumStats::Key
)SumStats::Key
&optional
A function to normalize the key. This can be used to aggregate or normalize the entire key.
calc_funcs:
vector
ofSumStats::Calculation
&optional
- hll_error_margin:
double
&default
=0.01
&optional
(present if base/frameworks/sumstats/plugins/hll_unique.zeek is loaded)
The error margin for HLL.
- hll_confidence:
double
&default
=0.95
&optional
(present if base/frameworks/sumstats/plugins/hll_unique.zeek is loaded)
The confidence for HLL.
- num_last_elements:
count
&default
=0
&optional
(present if base/frameworks/sumstats/plugins/last.zeek is loaded)
Number of elements to keep.
- num_samples:
count
&default
=0
&optional
(present if base/frameworks/sumstats/plugins/sample.zeek is loaded)
The number of sample Observations to collect.
- topk_size:
count
&default
=500
&optional
(present if base/frameworks/sumstats/plugins/topk.zeek is loaded)
Number of elements to keep in the top-k list.
- unique_max:
count
&optional
(present if base/frameworks/sumstats/plugins/unique.zeek is loaded)
Maximum number of unique values to store.
- stream:
Represents a reducer.
- SumStats::Result
- Type
Type to store a table of results for multiple reducers indexed by observation stream identifier.
- SumStats::ResultTable
- Type
Type to store a table of sumstats results indexed by keys.
- SumStats::ResultVal
- Type
-
- begin:
time
The time when the first observation was added to this result value.
- end:
time
The time when the last observation was added to this result value.
- num:
count
&default
=0
&optional
The number of observations received.
- average:
double
&optional
(present if base/frameworks/sumstats/plugins/average.zeek is loaded)
For numeric data, this is the average of all values.
- hll_unique:
count
&default
=0
&optional
(present if base/frameworks/sumstats/plugins/hll_unique.zeek is loaded)
If cardinality is being tracked, the number of unique items is tracked here.
- card:
opaque
of cardinality&optional
(present if base/frameworks/sumstats/plugins/hll_unique.zeek is loaded)
- hll_error_margin:
double
&optional
(present if base/frameworks/sumstats/plugins/hll_unique.zeek is loaded)
- hll_confidence:
double
&optional
(present if base/frameworks/sumstats/plugins/hll_unique.zeek is loaded)
- last_elements:
Queue::Queue
&optional
(present if base/frameworks/sumstats/plugins/last.zeek is loaded)
This is the queue where elements are maintained. Don’t access this value directly, instead use the
SumStats::get_last
function to get a vector of the current element values.- max:
double
&optional
(present if base/frameworks/sumstats/plugins/max.zeek is loaded)
For numeric data, this tracks the maximum value.
- min:
double
&optional
(present if base/frameworks/sumstats/plugins/min.zeek is loaded)
For numeric data, this tracks the minimum value.
- samples:
vector
ofSumStats::Observation
&default
=[]
&optional
(present if base/frameworks/sumstats/plugins/sample.zeek is loaded)
This is the vector in which the samples are maintained.
- sample_elements:
count
&default
=0
&optional
(present if base/frameworks/sumstats/plugins/sample.zeek is loaded)
Number of total observed elements.
- num_samples:
count
&default
=0
&optional
(present if base/frameworks/sumstats/plugins/sample.zeek is loaded)
- variance:
double
&optional
(present if base/frameworks/sumstats/plugins/variance.zeek is loaded)
For numeric data, this is the variance.
- prev_avg:
double
&optional
(present if base/frameworks/sumstats/plugins/variance.zeek is loaded)
- var_s:
double
&default
=0.0
&optional
(present if base/frameworks/sumstats/plugins/variance.zeek is loaded)
- std_dev:
double
&default
=0.0
&optional
(present if base/frameworks/sumstats/plugins/std-dev.zeek is loaded)
For numeric data, this calculates the standard deviation.
- sum:
double
&default
=0.0
&optional
(present if base/frameworks/sumstats/plugins/sum.zeek is loaded)
For numeric data, this tracks the sum of all values.
- topk:
opaque
of topk&optional
(present if base/frameworks/sumstats/plugins/topk.zeek is loaded)
A handle which can be passed to some built-in functions to get the top-k results.
- unique:
count
&default
=0
&optional
(present if base/frameworks/sumstats/plugins/unique.zeek is loaded)
If cardinality is being tracked, the number of unique values is tracked here.
- unique_max:
count
&optional
(present if base/frameworks/sumstats/plugins/unique.zeek is loaded)
- unique_vals:
set
[SumStats::Observation
]&optional
(present if base/frameworks/sumstats/plugins/unique.zeek is loaded)
- begin:
Result calculated for an observation stream fed into a reducer. Most of the fields are added by plugins.
- SumStats::SumStat
- Type
-
- name:
string
An arbitrary name for the sumstat so that it can be referred to later.
- epoch:
interval
The interval at which this sumstat should be “broken” and the epoch_result callback called. The results are also reset at this time so any threshold based detection needs to be set to a value that should be expected to happen within this epoch.
Passing an epoch of zero (e.g.
0 secs
) causes this sumstat to be set to manual epochs. You will have to manually end the epoch by callingSumStats::next_epoch
.- reducers:
set
[SumStats::Reducer
] The reducers for the SumStat.
- threshold_val:
function
(key:SumStats::Key
, result:SumStats::Result
)double
&optional
A function that will be called once for each observation in order to calculate a value from the
SumStats::Result
structure which will be used for thresholding. This function is required if a threshold value or a threshold_series is given.- threshold:
double
&optional
The threshold value for calling the threshold_crossed callback. If you need more than one threshold value, then use threshold_series instead.
- threshold_series:
vector
ofdouble
&optional
A series of thresholds for calling the threshold_crossed callback. These thresholds must be listed in ascending order, because a threshold is not checked until the preceding one has been crossed.
- threshold_crossed:
function
(key:SumStats::Key
, result:SumStats::Result
)void
&optional
A callback that is called when a threshold is crossed. A threshold is crossed when the value returned from threshold_val is greater than or equal to the threshold value, but only the first time this happens within an epoch.
- epoch_result:
function
(ts:time
, key:SumStats::Key
, result:SumStats::Result
)void
&optional
A callback that receives each of the results at the end of the analysis epoch. The function will be called once for each key.
- epoch_finished:
function
(ts:time
)void
&optional
A callback that will be called when a single collection interval is completed. The ts value will be the time of when the collection started.
- name:
Represents a SumStat, which consists of an aggregation of reducers along with mechanisms to handle various situations like the epoch ending or thresholds being crossed.
It’s best to not access any global state outside of the variables given to the callbacks because there is no assurance provided as to where the callbacks will be executed on clusters.
Functions
- SumStats::create
- Type
function
(ss:SumStats::SumStat
) :void
Create a summary statistic.
- Parameters
ss – The SumStat to create.
- SumStats::key2str
- Type
function
(key:SumStats::Key
) :string
Helper function to represent a
SumStats::Key
value as a simple string.- Parameters
key – The metric key that is to be converted into a string.
- Returns
A string representation of the metric key.
- SumStats::next_epoch
-
Manually end the current epoch for a sumstat. Calling this function will cause the end of the epoch processing of sumstats to start. Note that the epoch will not end immediately - especially in a cluster settings, a number of messages need to be exchanged between the cluster nodes.
Note that this function only can be called if the sumstat was created with an epoch time of zero (manual epochs).
In a cluster, this function must be called on the manager; it will not have any effect when called on workers.
- Parameters
ss_name – SumStat name.
- Returns
true on success, false on failure. Failures can be: sumstat not found, or sumstat not created for manual epochs.
- SumStats::observe
- Type
function
(id:string
, orig_key:SumStats::Key
, obs:SumStats::Observation
) :void
Add data into an observation stream. This should be called when a script has measured some point value.
- Parameters
id – The observation stream identifier that the data point represents.
key – The key that the value is related to.
obs – The data point to send into the stream.
- SumStats::request_key
- Type
function
(ss_name:string
, key:SumStats::Key
) :SumStats::Result
Dynamically request a sumstat key. This function should be used sparingly and not as a replacement for the callbacks from the
SumStats::SumStat
record. The function is only available for use within “when” statements as an asynchronous function.- Parameters
ss_name – SumStat name.
key – The SumStat key being requested.
- Returns
The result for the requested sumstat key.