base/bif/strings.bif.zeek

GLOBAL

Definitions of built-in functions related to string processing and manipulation.

Namespace:GLOBAL

Summary

Functions

clean: function Replaces non-printable characters in a string with escaped sequences.
count_substr: function Returns the number of times a substring occurs within a string
edit: function Returns an edited version of a string that applies a special “backspace character” (usually \x08 for backspace or \x7f for DEL).
ends_with: function Returns whether a string ends with a substring.
escape_string: function Replaces non-printable characters in a string with escaped sequences.
find_all: function Finds all occurrences of a pattern in a string.
find_all_ordered: function Finds all occurrences of a pattern in a string.
find_last: function Finds the last occurrence of a pattern in a string.
find_str: function Finds a string within another string, starting from the beginning.
gsub: function Substitutes a given replacement string for all occurrences of a pattern in a given string.
hexdump: function Returns a hex dump for given input data.
is_alnum: function Returns whether an entire string is alphanumeric characters
is_alpha: function Returns whether an entire string is alphabetic characters.
is_ascii: function Determines whether a given string contains only ASCII characters.
is_num: function Returns whether an entire string consists only of digits.
join_string_vec: function Joins all values in the given vector of strings with a separator placed between each element.
levenshtein_distance: function Calculates the Levenshtein distance between the two strings.
ljust: function Returns a left-justified version of the string, padded to a specific length with a specified character.
lstrip: function Removes all combinations of characters in the chars argument starting at the beginning of the string until first mismatch.
remove_prefix: function Similar to lstrip(), except does the removal repeatedly if the pattern repeats at the start of the string.
remove_suffix: function Similar to rstrip(), except does the removal repeatedly if the pattern repeats at the end of the string.
reverse: function Returns a reversed copy of the string
rfind_str: function The same as find_str, but returns the highest index matching the substring instead of the smallest.
rjust: function Returns a right-justified version of the string, padded to a specific length with a specified character.
rstrip: function Removes all combinations of characters in the chars argument starting at the end of the string until first mismatch.
safe_shell_quote: function Takes a string and escapes characters that would allow execution of commands at the shell level.
split_string: function Splits a string into an array of strings according to a pattern.
split_string1: function Splits a string once into a two-element array of strings according to a pattern.
split_string_all: function Splits a string into an array of strings according to a pattern.
split_string_n: function Splits a string a given number of times into an array of strings according to a pattern.
starts_with: function Returns whether a string starts with a substring.
str_smith_waterman: function Uses the Smith-Waterman algorithm to find similar/overlapping substrings.
str_split: function &deprecated = Splits a string into substrings with the help of an index vector of cutting points.
str_split_indices: function Splits a string into substrings with the help of an index vector of cutting points.
strcmp: function Lexicographically compares two strings.
string_cat: function Concatenates all arguments into a single string.
string_fill: function Generates a string of a given size and fills it with repetitions of a source string.
string_to_ascii_hex: function Returns an ASCII hexadecimal representation of a string.
strip: function Strips whitespace at both ends of a string.
strstr: function Locates the first occurrence of one string in another.
sub: function Substitutes a given replacement string for the first occurrence of a pattern in a given string.
sub_bytes: function Get a substring from a string, given a starting position and length.
subst_string: function Substitutes each (non-overlapping) appearance of a string in another.
swap_case: function Swaps the case of every alphabetic character in a string.
to_lower: function Replaces all uppercase letters in a string with their lowercase counterpart.
to_string_literal: function Replaces non-printable characters in a string with escaped sequences.
to_title: function Converts a string to Title Case.
to_upper: function Replaces all lowercase letters in a string with their uppercase counterpart.
zfill: function Returns a copy of a string filled on the left side with zeroes.

Detailed Interface

Functions

clean
Type:function (str: string) : string

Replaces non-printable characters in a string with escaped sequences. The mappings are:

  • values not in [32, 126] to \xXX

If the string does not yet have a trailing NUL, one is added internally.

In contrast to escape_string, this encoding is not fully reversible.`

Str:The string to escape.
Returns:The escaped string.

See also: to_string_literal, escape_string

count_substr
Type:function (str: string, sub: string) : count

Returns the number of times a substring occurs within a string

Str:The string to search in.
Substr:The string to search for.
Returns:The number of times the substring occurred.
edit
Type:function (arg_s: string, arg_edit_char: string) : string

Returns an edited version of a string that applies a special “backspace character” (usually \x08 for backspace or \x7f for DEL). For example, edit("hello there", "e") returns "llo t".

Arg_s:The string to edit.
Arg_edit_char:A string of exactly one character that represents the “backspace character”. If it is longer than one character Zeek generates a run-time error and uses the first character in the string.
Returns:An edited version of arg_s where arg_edit_char triggers the deletion of the last character.

See also: clean, to_string_literal, escape_string, strip

ends_with
Type:function (str: string, sub: string) : bool

Returns whether a string ends with a substring.

escape_string
Type:function (s: string) : string

Replaces non-printable characters in a string with escaped sequences. The mappings are:

  • values not in [32, 126] to \xXX
  • \ to \\

In contrast to clean, this encoding is fully reversible.`

Str:The string to escape.
Returns:The escaped string.

See also: clean, to_string_literal

find_all
Type:function (str: string, re: pattern) : string_set

Finds all occurrences of a pattern in a string.

Str:The string to inspect.
Re:The pattern to look for in str.
Returns:The set of strings in str that match re, or the empty set.
find_all_ordered
Type:function (str: string, re: pattern) : string_vec

Finds all occurrences of a pattern in a string. The order in which occurrences are found is preverved and the return value may contain duplicate elements.

Str:The string to inspect.
Re:The pattern to look for in str.
Returns:All strings in str that match re, or an empty vector.
find_last
Type:function (str: string, re: pattern) : string

Finds the last occurrence of a pattern in a string. This function returns the match that starts at the largest index in the string, which is not necessarily the longest match. For example, a pattern of /.*/ will return the final character in the string.

Str:The string to inspect.
Re:The pattern to look for in str.
Returns:The last string in str that matches re, or the empty string.
find_str
Type:function (str: string, sub: string, start: count &default = 0 &optional, end: int &default = -1 &optional) : int

Finds a string within another string, starting from the beginning. This works by taking a substring within the provided indexes and searching for the sub argument. This means that ranges shorter than the string in the sub argument will always return a failure.

Str:The string to search in.
Substr:The string to search for.
Start:An optional position for the start of the substring.
End:An optional position for the end of the substring. A value less than zero (such as the default -1) means a search until the end of the string.
Returns:The position of the substring. Returns -1 if the string wasn’t found. Prints an error if the starting position is after the ending position.
gsub
Type:function (str: string, re: pattern, repl: string) : string

Substitutes a given replacement string for all occurrences of a pattern in a given string.

Str:The string to perform the substitution in.
Re:The pattern being replaced with repl.
Repl:The string that replaces re.
Returns:A copy of str with all occurrences of re replaced with repl.

See also: sub, subst_string

hexdump
Type:function (data_str: string) : string

Returns a hex dump for given input data. The hex dump renders 16 bytes per line, with hex on the left and ASCII (where printable) on the right.

Data_str:The string to dump in hex format.
Returns:The hex dump of the given string.

See also: string_to_ascii_hex, bytestring_to_hexstr

Note

Based on Netdude’s hex editor code.

is_alnum
Type:function (str: string) : bool

Returns whether an entire string is alphanumeric characters

is_alpha
Type:function (str: string) : bool

Returns whether an entire string is alphabetic characters.

is_ascii
Type:function (str: string) : bool

Determines whether a given string contains only ASCII characters.

Str:The string to examine.
Returns:False if any byte value of str is greater than 127, and true otherwise.

See also: to_upper, to_lower

is_num
Type:function (str: string) : bool

Returns whether an entire string consists only of digits.

join_string_vec
Type:function (vec: string_vec, sep: string) : string

Joins all values in the given vector of strings with a separator placed between each element.

Sep:The separator to place between each element.
Vec:The string_vec (vector of string).
Returns:The concatenation of all elements in vec, with sep placed between each element.

See also: cat, cat_sep, string_cat, fmt

levenshtein_distance
Type:function (s1: string, s2: string) : count

Calculates the Levenshtein distance between the two strings. See Wikipedia for more information.

S1:The first string.
S2:The second string.
Returns:The Levenshtein distance of two strings as a count.
ljust
Type:function (str: string, width: count, fill: string &default = " " &optional) : string

Returns a left-justified version of the string, padded to a specific length with a specified character.

Str:The string to left-justify.
Count:The length of the returned string. If this value is less than or equal to the length of str, a copy of str is returned.
Fill:The character used to fill in any extra characters in the resulting string. If a string longer than one character is passed, an error is reported. This defaults to the space character.
Returns:A left-justified version of a string, padded with characters to a specific length.
lstrip
Type:function (str: string, chars: string &default = " \x09\x0a\x0d\x0b\x0c" &optional) : string

Removes all combinations of characters in the chars argument starting at the beginning of the string until first mismatch.

Str:The string to strip characters from.
Chars:A string consisting of the characters to be removed. Defaults to all whitespace characters.
Returns:A copy of str with the characters in chars removed from the beginning.

See also: sub, gsub, strip, rstrip

remove_prefix
Type:function (str: string, sub: string) : string

Similar to lstrip(), except does the removal repeatedly if the pattern repeats at the start of the string.

remove_suffix
Type:function (str: string, sub: string) : string

Similar to rstrip(), except does the removal repeatedly if the pattern repeats at the end of the string.

reverse
Type:function (str: string) : string

Returns a reversed copy of the string

Str:The string to reverse.
Returns:A reversed copy of str
rfind_str
Type:function (str: string, sub: string, start: count &default = 0 &optional, end: int &default = -1 &optional) : int

The same as find_str, but returns the highest index matching the substring instead of the smallest.

Str:The string to search in.
Substr:The string to search for.
Start:An optional position for the start of the substring.
End:An optional position for the end of the substring. A value less than zero (such as the default -1) means a search from the end of the string.
Returns:The position of the substring. Returns -1 if the string wasn’t found. Prints an error if the starting position is after the ending position.
rjust
Type:function (str: string, width: count, fill: string &default = " " &optional) : string

Returns a right-justified version of the string, padded to a specific length with a specified character.

Str:The string to right-justify.
Count:The length of the returned string. If this value is less than or equal to the length of str, a copy of str is returned.
Fill:The character used to fill in any extra characters in the resulting string. If a string longer than one character is passed, an error is reported. This defaults to the space character.
Returns:A right-justified version of a string, padded with characters to a specific length.
rstrip
Type:function (str: string, chars: string &default = " \x09\x0a\x0d\x0b\x0c" &optional) : string

Removes all combinations of characters in the chars argument starting at the end of the string until first mismatch.

Str:The string to strip characters from.
Chars:A string consisting of the characters to be removed. Defaults to all whitespace characters.
Returns:A copy of str with the characters in chars removed from the end.

See also: sub, gsub, strip, lstrip

safe_shell_quote
Type:function (source: string) : string

Takes a string and escapes characters that would allow execution of commands at the shell level. Must be used before including strings in system or similar calls.

Source:The string to escape.
Returns:A shell-escaped version of source. Specifically, this backslash-escapes characters whose literal value is not otherwise preserved by enclosure in double-quotes (dollar-sign, backquote, backslash, and double-quote itself), and then encloses that backslash-escaped string in double-quotes to ultimately preserve the literal value of all input characters.

See also: system, safe_shell_quote

split_string
Type:function (str: string, re: pattern) : string_vec

Splits a string into an array of strings according to a pattern.

Str:The string to split.
Re:The pattern describing the element separator in str.
Returns:An array of strings where each element corresponds to a substring in str separated by re.

See also: split_string1, split_string_all, split_string_n, str_split

split_string1
Type:function (str: string, re: pattern) : string_vec

Splits a string once into a two-element array of strings according to a pattern. This function is the same as split_string, but str is only split once (if possible) at the earliest position and an array of two strings is returned.

Str:The string to split.
Re:The pattern describing the separator to split str in two pieces.
Returns:An array of strings with two elements in which the first represents the substring in str up to the first occurence of re, and the second everything after re. An array of one string is returned when s cannot be split.

See also: split_string, split_string_all, split_string_n, str_split

split_string_all
Type:function (str: string, re: pattern) : string_vec

Splits a string into an array of strings according to a pattern. This function is the same as split_string, except that the separators are returned as well. For example, split_string_all("a-b--cd", /(\-)+/) returns {"a", "-", "b", "--", "cd"}: odd-indexed elements do match the pattern and even-indexed ones do not.

Str:The string to split.
Re:The pattern describing the element separator in str.
Returns:An array of strings where each two successive elements correspond to a substring in str of the part not matching re (even-indexed) and the part that matches re (odd-indexed).

See also: split_string, split_string1, split_string_n, str_split

split_string_n
Type:function (str: string, re: pattern, incl_sep: bool, max_num_sep: count) : string_vec

Splits a string a given number of times into an array of strings according to a pattern. This function is similar to split_string1 and split_string_all, but with customizable behavior with respect to including separators in the result and the number of times to split.

Str:The string to split.
Re:The pattern describing the element separator in str.
Incl_sep:A flag indicating whether to include the separator matches in the result (as in split_string_all).
Max_num_sep:The number of times to split str.
Returns:An array of strings where, if incl_sep is true, each two successive elements correspond to a substring in str of the part not matching re (even-indexed) and the part that matches re (odd-indexed).

See also: split_string, split_string1, split_string_all, str_split

starts_with
Type:function (str: string, sub: string) : bool

Returns whether a string starts with a substring.

str_smith_waterman
Type:function (s1: string, s2: string, params: sw_params) : sw_substring_vec

Uses the Smith-Waterman algorithm to find similar/overlapping substrings. See Wikipedia.

S1:The first string.
S2:The second string.
Params:Parameters for the Smith-Waterman algorithm.
Returns:The result of the Smith-Waterman algorithm calculation.
str_split
Type:function (s: string, idx: index_vec) : string_vec
Attributes:&deprecated = “Remove in v4.1. Use str_split_indices.”

Splits a string into substrings with the help of an index vector of cutting points.

S:The string to split.
Idx:The index vector (vector of count) with the cutting points.
Returns:A one-indexed vector of strings.

See also: split_string, split_string1, split_string_all, split_string_n

str_split_indices
Type:function (s: string, idx: index_vec) : string_vec

Splits a string into substrings with the help of an index vector of cutting points. This differs from str_split() in that it does not return an empty element at the beginning of the result.

S:The string to split.
Idx:The index vector (vector of count) with the cutting points
Returns:A zero-indexed vector of strings.

See also: split_string, split_string1, split_string_all, split_string_n

strcmp
Type:function (s1: string, s2: string) : int

Lexicographically compares two strings.

S1:The first string.
S2:The second string.
Returns:An integer greater than, equal to, or less than 0 according as s1 is greater than, equal to, or less than s2.
string_cat
Type:function (…) : string

Concatenates all arguments into a single string. The function takes a variable number of arguments of type string and stitches them together.

Returns:The concatenation of all (string) arguments.

See also: cat, cat_sep, fmt, join_string_vec

string_fill
Type:function (len: int, source: string) : string

Generates a string of a given size and fills it with repetitions of a source string.

Len:The length of the output string.
Source:The string to concatenate repeatedly until len has been reached.
Returns:A string of length len filled with source.
string_to_ascii_hex
Type:function (s: string) : string

Returns an ASCII hexadecimal representation of a string.

S:The string to convert to hex.
Returns:A copy of s where each byte is replaced with the corresponding hex nibble.
strip
Type:function (str: string) : string

Strips whitespace at both ends of a string.

Str:The string to strip the whitespace from.
Returns:A copy of str with leading and trailing whitespace removed.

See also: sub, gsub, lstrip, rstrip

strstr
Type:function (big: string, little: string) : count

Locates the first occurrence of one string in another.

Big:The string to look in.
Little:The (smaller) string to find inside big.
Returns:The location of little in big, or 0 if little is not found in big.

See also: find_all, find_last

sub
Type:function (str: string, re: pattern, repl: string) : string

Substitutes a given replacement string for the first occurrence of a pattern in a given string.

Str:The string to perform the substitution in.
Re:The pattern being replaced with repl.
Repl:The string that replaces re.
Returns:A copy of str with the first occurence of re replaced with repl.

See also: gsub, subst_string

sub_bytes
Type:function (s: string, start: count, n: int) : string

Get a substring from a string, given a starting position and length.

S:The string to obtain a substring from.
Start:The starting position of the substring in s, where 1 is the first character. As a special case, 0 also represents the first character.
N:The number of characters to extract, beginning at start.
Returns:A substring of s of length n from position start.
subst_string
Type:function (s: string, from: string, to: string) : string

Substitutes each (non-overlapping) appearance of a string in another.

S:The string in which to perform the substitution.
From:The string to look for which is replaced with to.
To:The string that replaces all occurrences of from in s.
Returns:A copy of s where each occurrence of from is replaced with to.

See also: sub, gsub

swap_case
Type:function (str: string) : string

Swaps the case of every alphabetic character in a string. For example, the string “aBc” be returned as “AbC”.

Str:The string to swap cases in.
Returns:A copy of the str with the case of each character swapped.
to_lower
Type:function (str: string) : string

Replaces all uppercase letters in a string with their lowercase counterpart.

Str:The string to convert to lowercase letters.
Returns:A copy of the given string with the uppercase letters (as indicated by isascii and isupper) folded to lowercase (via tolower).

See also: to_upper, is_ascii

to_string_literal
Type:function (str: string) : string

Replaces non-printable characters in a string with escaped sequences. The mappings are:

  • values not in [32, 126] to \xXX
  • \ to \\
  • ' and "" to \' and \", respectively.
Str:The string to escape.
Returns:The escaped string.

See also: clean, escape_string

to_title
Type:function (str: string) : string

Converts a string to Title Case. This changes the first character of each sequence of non-space characters in the string to be capitalized. See https://docs.python.org/2/library/stdtypes.html#str.title for more info.

Str:The string to convert.
Returns:A title-cased version of the string.
to_upper
Type:function (str: string) : string

Replaces all lowercase letters in a string with their uppercase counterpart.

Str:The string to convert to uppercase letters.
Returns:A copy of the given string with the lowercase letters (as indicated by isascii and islower) folded to uppercase (via toupper).

See also: to_lower, is_ascii

zfill
Type:function (str: string, width: count) : string

Returns a copy of a string filled on the left side with zeroes. This is effectively rjust(str, width, “0”).