base/bif/strings.bif.zeek¶
- GLOBAL¶
Definitions of built-in functions related to string processing and manipulation.
- Namespace
GLOBAL
Summary¶
Functions¶
Replaces non-printable characters in a string with escaped sequences. |
|
Returns the number of times a substring occurs within a string |
|
Returns an edited version of a string that applies a special
“backspace character” (usually |
|
Returns whether a string ends with a substring. |
|
Replaces non-printable characters in a string with escaped sequences. |
|
Finds all occurrences of a pattern in a string. |
|
Finds all occurrences of a pattern in a string. |
|
Finds the last occurrence of a pattern in a string. |
|
Finds a string within another string, starting from the beginning. |
|
Substitutes a given replacement string for all occurrences of a pattern in a given string. |
|
Returns a hex dump for given input data. |
|
Returns whether a string consists entirely of alphanumeric characters. |
|
Returns whether a string consists entirely of alphabetic characters. |
|
Determines whether a given string contains only ASCII characters. |
|
Returns whether a string consists entirely of digits. |
|
Joins all values in the given set of strings with a separator placed between each element. |
|
Joins all values in the given vector of strings with a separator placed between each element. |
|
Calculates the Levenshtein distance between the two strings. |
|
Returns a left-justified version of the string, padded to a specific length with a specified character. |
|
Removes all combinations of characters in the chars argument starting at the beginning of the string until first mismatch. |
|
Similar to lstrip(), except does the removal repeatedly if the pattern repeats at the start of the string. |
|
Similar to rstrip(), except does the removal repeatedly if the pattern repeats at the end of the string. |
|
Returns a reversed copy of the string |
|
The same as |
|
Returns a right-justified version of the string, padded to a specific length with a specified character. |
|
Removes all combinations of characters in the chars argument starting at the end of the string until first mismatch. |
|
Takes a string and escapes characters that would allow execution of commands at the shell level. |
|
Splits a string into an array of strings according to a pattern. |
|
Splits a string once into a two-element array of strings according to a pattern. |
|
Splits a string into an array of strings according to a pattern. |
|
Splits a string a given number of times into an array of strings according to a pattern. |
|
Returns whether a string starts with a substring. |
|
Uses the Smith-Waterman algorithm to find similar/overlapping substrings. |
|
Splits a string into substrings with the help of an index vector of cutting points. |
|
Lexicographically compares two strings. |
|
Concatenates all arguments into a single string. |
|
Generates a string of a given size and fills it with repetitions of a source string. |
|
Returns an ASCII hexadecimal representation of a string. |
|
Strips whitespace at both ends of a string. |
|
Locates the first occurrence of one string in another. |
|
Substitutes a given replacement string for the first occurrence of a pattern in a given string. |
|
Get a substring from a string, given a starting position and length. |
|
Substitutes each (non-overlapping) appearance of a string in another. |
|
Swaps the case of every alphabetic character in a string. |
|
Replaces all uppercase letters in a string with their lowercase counterpart. |
|
Replaces non-printable characters in a string with escaped sequences. |
|
Converts a string to Title Case. |
|
Replaces all lowercase letters in a string with their uppercase counterpart. |
|
Returns a copy of a string filled on the left side with zeroes. |
Detailed Interface¶
Functions¶
- clean¶
-
Replaces non-printable characters in a string with escaped sequences. The mappings are:
values not in [32, 126] to
\xXX
If the string does not yet have a trailing NUL, one is added internally.
In contrast to
escape_string
, this encoding is not fully reversible.`- Parameters
str – The string to escape.
- Returns
The escaped string.
See also:
to_string_literal
,escape_string
- count_substr¶
-
Returns the number of times a substring occurs within a string
- Parameters
str – The string to search in.
substr – The string to search for.
- Returns
The number of times the substring occurred.
- edit¶
-
Returns an edited version of a string that applies a special “backspace character” (usually
\x08
for backspace or\x7f
for DEL). For example,edit("hello there", "e")
returns"llo t"
.- Parameters
arg_s – The string to edit.
arg_edit_char – A string of exactly one character that represents the “backspace character”. If it is longer than one character Zeek generates a run-time error and uses the first character in the string.
- Returns
An edited version of arg_s where arg_edit_char triggers the deletion of the last character.
See also:
clean
,to_string_literal
,escape_string
,strip
- ends_with¶
-
Returns whether a string ends with a substring.
- escape_string¶
-
Replaces non-printable characters in a string with escaped sequences. The mappings are:
values not in [32, 126] to
\xXX
\
to\\
In contrast to
clean
, this encoding is fully reversible.`- Parameters
str – The string to escape.
- Returns
The escaped string.
See also:
clean
,to_string_literal
- find_all¶
-
Finds all occurrences of a pattern in a string.
- Parameters
str – The string to inspect.
re – The pattern to look for in str.
max_str_size – The maximum string size allowed as input. If set to -1, this will use the
max_find_all_string_length
global constant. If set to 0, this check is disabled. If the length of str is greater than this size, an empty set is returned.
- Returns
The set of strings in str that match re, or the empty set.
- find_all_ordered¶
-
Finds all occurrences of a pattern in a string. The order in which occurrences are found is preserved and the return value may contain duplicate elements.
- Parameters
str – The string to inspect.
re – The pattern to look for in str.
max_str_size – The maximum string size allowed as input. If set to -1, this will use the
max_find_all_string_length
global constant. If set to 0, this check is disabled. If the length of str is greater than this size, an empty set is returned.
- Returns
All strings in str that match re, or an empty vector.
- find_last¶
-
Finds the last occurrence of a pattern in a string. This function returns the match that starts at the largest index in the string, which is not necessarily the longest match. For example, a pattern of
/.*/
will return the final character in the string.- Parameters
str – The string to inspect.
re – The pattern to look for in str.
- Returns
The last string in str that matches re, or the empty string.
- find_str¶
- Type
function
(str:string
, sub:string
, start:count
&default
=0
&optional
, end:int
&default
=-1
&optional
, case_sensitive:bool
&default
=T
&optional
) :int
Finds a string within another string, starting from the beginning. This works by taking a substring within the provided indexes and searching for the sub argument. This means that ranges shorter than the string in the sub argument will always return a failure.
- Parameters
str – The string to search in.
substr – The string to search for.
start – An optional position for the start of the substring.
end – An optional position for the end of the substring. A value less than zero (such as the default -1) means a search until the end of the string.
case_sensitive – Set to false to perform a case-insensitive search. (default: T). Note that case-insensitive searches use the
tolower
libc function, which is locale-sensitive.
- Returns
The position of the substring. Returns -1 if the string wasn’t found. Prints an error if the starting position is after the ending position.
- gsub¶
-
Substitutes a given replacement string for all occurrences of a pattern in a given string.
- Parameters
str – The string to perform the substitution in.
re – The pattern being replaced with repl.
repl – The string that replaces re.
- Returns
A copy of str with all occurrences of re replaced with repl.
See also:
sub
,subst_string
- hexdump¶
-
Returns a hex dump for given input data. The hex dump renders 16 bytes per line, with hex on the left and ASCII (where printable) on the right.
- Parameters
data_str – The string to dump in hex format.
- Returns
The hex dump of the given string.
See also:
string_to_ascii_hex
,bytestring_to_hexstr
Note
Based on Netdude’s hex editor code.
- is_alnum¶
-
Returns whether a string consists entirely of alphanumeric characters. The empty string is not alphanumeric.
- is_alpha¶
-
Returns whether a string consists entirely of alphabetic characters. The empty string is not alphabetic.
- is_ascii¶
-
Determines whether a given string contains only ASCII characters. The empty string is ASCII.
- Parameters
str – The string to examine.
- Returns
False if any byte value of str is greater than 127, and true otherwise.
- is_num¶
-
Returns whether a string consists entirely of digits. The empty string is not numeric.
- join_string_set¶
- Type
function
(ss:string_set
, sep:string
) :string
Joins all values in the given set of strings with a separator placed between each element.
- Parameters
ss – The
string_set
(set[string]
).sep – The separator to place between each element.
- Returns
The concatenation of all elements in s, with sep placed between each element.
See also:
cat
,cat_sep
,string_cat
,fmt
,join_string_vec
- join_string_vec¶
- Type
function
(vec:string_vec
, sep:string
) :string
Joins all values in the given vector of strings with a separator placed between each element.
- Parameters
sep – The separator to place between each element.
vec – The
string_vec
(vector of string
).
- Returns
The concatenation of all elements in vec, with sep placed between each element.
See also:
cat
,cat_sep
,string_cat
,fmt
- levenshtein_distance¶
-
Calculates the Levenshtein distance between the two strings. See Wikipedia for more information.
- Parameters
s1 – The first string.
s2 – The second string.
- Returns
The Levenshtein distance of two strings as a count.
- ljust¶
-
Returns a left-justified version of the string, padded to a specific length with a specified character.
- Parameters
str – The string to left-justify.
count – The length of the returned string. If this value is less than or equal to the length of str, a copy of str is returned.
fill – The character used to fill in any extra characters in the resulting string. If a string longer than one character is passed, an error is reported. This defaults to the space character.
- Returns
A left-justified version of a string, padded with characters to a specific length.
- lstrip¶
-
Removes all combinations of characters in the chars argument starting at the beginning of the string until first mismatch.
- Parameters
str – The string to strip characters from.
chars – A string consisting of the characters to be removed. Defaults to all whitespace characters.
- Returns
A copy of str with the characters in chars removed from the beginning.
- remove_prefix¶
-
Similar to lstrip(), except does the removal repeatedly if the pattern repeats at the start of the string.
- remove_suffix¶
-
Similar to rstrip(), except does the removal repeatedly if the pattern repeats at the end of the string.
- reverse¶
-
Returns a reversed copy of the string
- Parameters
str – The string to reverse.
- Returns
A reversed copy of str
- rfind_str¶
- Type
function
(str:string
, sub:string
, start:count
&default
=0
&optional
, end:int
&default
=-1
&optional
, case_sensitive:bool
&default
=T
&optional
) :int
The same as
find_str
, but returns the highest index matching the substring instead of the smallest.- Parameters
str – The string to search in.
substr – The string to search for.
start – An optional position for the start of the substring.
end – An optional position for the end of the substring. A value less than zero (such as the default -1) means a search from the end of the string.
case_sensitive – Set to false to perform a case-insensitive search. (default: T). Note that case-insensitive searches use the
tolower
libc function, which is locale-sensitive.
- Returns
The position of the substring. Returns -1 if the string wasn’t found. Prints an error if the starting position is after the ending position.
- rjust¶
-
Returns a right-justified version of the string, padded to a specific length with a specified character.
- Parameters
str – The string to right-justify.
count – The length of the returned string. If this value is less than or equal to the length of str, a copy of str is returned.
fill – The character used to fill in any extra characters in the resulting string. If a string longer than one character is passed, an error is reported. This defaults to the space character.
- Returns
A right-justified version of a string, padded with characters to a specific length.
- rstrip¶
-
Removes all combinations of characters in the chars argument starting at the end of the string until first mismatch.
- Parameters
str – The string to strip characters from.
chars – A string consisting of the characters to be removed. Defaults to all whitespace characters.
- Returns
A copy of str with the characters in chars removed from the end.
- safe_shell_quote¶
-
Takes a string and escapes characters that would allow execution of commands at the shell level. Must be used before including strings in
system
or similar calls.- Parameters
source – The string to escape.
- Returns
A shell-escaped version of source. Specifically, this backslash-escapes characters whose literal value is not otherwise preserved by enclosure in double-quotes (dollar-sign, backquote, backslash, and double-quote itself), and then encloses that backslash-escaped string in double-quotes to ultimately preserve the literal value of all input characters.
See also:
system
,safe_shell_quote
- split_string¶
- Type
function
(str:string
, re:pattern
) :string_vec
Splits a string into an array of strings according to a pattern.
- Parameters
str – The string to split.
re – The pattern describing the element separator in str.
- Returns
An array of strings where each element corresponds to a substring in str separated by re.
See also:
split_string1
,split_string_all
,split_string_n
- split_string1¶
- Type
function
(str:string
, re:pattern
) :string_vec
Splits a string once into a two-element array of strings according to a pattern. This function is the same as
split_string
, but str is only split once (if possible) at the earliest position and an array of two strings is returned.- Parameters
str – The string to split.
re – The pattern describing the separator to split str in two pieces.
- Returns
An array of strings with two elements in which the first represents the substring in str up to the first occurence of re, and the second everything after re. An array of one string is returned when s cannot be split.
See also:
split_string
,split_string_all
,split_string_n
- split_string_all¶
- Type
function
(str:string
, re:pattern
) :string_vec
Splits a string into an array of strings according to a pattern. This function is the same as
split_string
, except that the separators are returned as well. For example,split_string_all("a-b--cd", /(\-)+/)
returns{"a", "-", "b", "--", "cd"}
: odd-indexed elements do match the pattern and even-indexed ones do not.- Parameters
str – The string to split.
re – The pattern describing the element separator in str.
- Returns
An array of strings where each two successive elements correspond to a substring in str of the part not matching re (even-indexed) and the part that matches re (odd-indexed).
See also:
split_string
,split_string1
,split_string_n
- split_string_n¶
-
Splits a string a given number of times into an array of strings according to a pattern. This function is similar to
split_string1
andsplit_string_all
, but with customizable behavior with respect to including separators in the result and the number of times to split.- Parameters
str – The string to split.
re – The pattern describing the element separator in str.
incl_sep – A flag indicating whether to include the separator matches in the result (as in
split_string_all
).max_num_sep – The number of times to split str.
- Returns
An array of strings where, if incl_sep is true, each two successive elements correspond to a substring in str of the part not matching re (even-indexed) and the part that matches re (odd-indexed).
See also:
split_string
,split_string1
,split_string_all
- starts_with¶
-
Returns whether a string starts with a substring.
- str_smith_waterman¶
- Type
function
(s1:string
, s2:string
, params:sw_params
) :sw_substring_vec
Uses the Smith-Waterman algorithm to find similar/overlapping substrings. See Wikipedia.
- Parameters
s1 – The first string.
s2 – The second string.
params – Parameters for the Smith-Waterman algorithm.
- Returns
The result of the Smith-Waterman algorithm calculation.
- str_split_indices¶
- Type
function
(s:string
, idx:index_vec
) :string_vec
Splits a string into substrings with the help of an index vector of cutting points.
- Parameters
s – The string to split.
idx – The index vector (
vector of count
) with the cutting points
- Returns
A zero-indexed vector of strings.
See also:
split_string
,split_string1
,split_string_all
,split_string_n
- strcmp¶
-
Lexicographically compares two strings.
- Parameters
s1 – The first string.
s2 – The second string.
- Returns
An integer greater than, equal to, or less than 0 according as s1 is greater than, equal to, or less than s2.
- string_cat¶
-
Concatenates all arguments into a single string. The function takes a variable number of arguments of type string and stitches them together.
- Returns
The concatenation of all (string) arguments.
See also:
cat
,cat_sep
,fmt
,join_string_vec
- string_fill¶
-
Generates a string of a given size and fills it with repetitions of a source string.
- Parameters
len – The length of the output string.
source – The string to concatenate repeatedly until len has been reached.
- Returns
A string of length len filled with source.
- string_to_ascii_hex¶
-
Returns an ASCII hexadecimal representation of a string.
- Parameters
s – The string to convert to hex.
- Returns
A copy of s where each byte is replaced with the corresponding hex nibble.
- strip¶
-
Strips whitespace at both ends of a string.
- Parameters
str – The string to strip the whitespace from.
- Returns
A copy of str with leading and trailing whitespace removed.
- strstr¶
-
Locates the first occurrence of one string in another.
- Parameters
big – The string to look in.
little – The (smaller) string to find inside big.
- Returns
The location of little in big, or 0 if little is not found in big.
- sub¶
-
Substitutes a given replacement string for the first occurrence of a pattern in a given string.
- Parameters
str – The string to perform the substitution in.
re – The pattern being replaced with repl.
repl – The string that replaces re.
- Returns
A copy of str with the first occurence of re replaced with repl.
See also:
gsub
,subst_string
- sub_bytes¶
-
Get a substring from a string, given a starting position and length.
- Parameters
s – The string to obtain a substring from.
start – The starting position of the substring in s, where 1 is the first character. As a special case, 0 also represents the first character.
n – The number of characters to extract, beginning at start.
- Returns
A substring of s of length n from position start.
- subst_string¶
-
Substitutes each (non-overlapping) appearance of a string in another.
- Parameters
s – The string in which to perform the substitution.
from – The string to look for which is replaced with to.
to – The string that replaces all occurrences of from in s.
- Returns
A copy of s where each occurrence of from is replaced with to.
- swap_case¶
-
Swaps the case of every alphabetic character in a string. For example, the string “aBc” be returned as “AbC”.
- Parameters
str – The string to swap cases in.
- Returns
A copy of the str with the case of each character swapped.
- to_lower¶
-
Replaces all uppercase letters in a string with their lowercase counterpart.
- Parameters
str – The string to convert to lowercase letters.
- Returns
A copy of the given string with the uppercase letters (as indicated by
isascii
andisupper
) folded to lowercase (viatolower
).
- to_string_literal¶
-
Replaces non-printable characters in a string with escaped sequences. The mappings are:
values not in [32, 126] to
\xXX
\
to\\
'
and""
to\'
and\"
, respectively.
- Parameters
str – The string to escape.
- Returns
The escaped string.
See also:
clean
,escape_string
- to_title¶
-
Converts a string to Title Case. This changes the first character of each sequence of non-space characters in the string to be capitalized. See https://docs.python.org/3/library/stdtypes.html#str.title for more info.
- Parameters
str – The string to convert.
- Returns
A title-cased version of the string.
- to_upper¶
-
Replaces all lowercase letters in a string with their uppercase counterpart.
- Parameters
str – The string to convert to uppercase letters.
- Returns
A copy of the given string with the lowercase letters (as indicated by
isascii
andislower
) folded to uppercase (viatoupper
).