base/utils/urls.zeek¶
Functions for URL handling.
Summary¶
Redefinable Options¶
A regular expression for matching and extracting URLs. |
Types¶
A URI, as parsed by |
Functions¶
Extracts URLs discovered in arbitrary text. |
|
Extracts URLs discovered in arbitrary text without the URL scheme included. |
Detailed Interface¶
Redefinable Options¶
-
url_regex¶ - Type
- Attributes
- Default
/^?(^([a-zA-Z\-]{3,5}):\/\/(-\.)?([^[:blank:]\/?\.#-]+\.?)+(\/[^[:blank:]]*)?)$?/
A regular expression for matching and extracting URLs. This is the @imme_emosol regex from https://mathiasbynens.be/demo/url-regex, adapted for Zeek. It’s not perfect for all of their test cases, but it’s one of the shorter ones that covers most of the test cases.
Types¶
-
URI¶ - Type
-
- scheme:
string&optional The URL’s scheme..
- netlocation:
string The location, which could be a domain name or an IP address. Left empty if not specified.
- portnum:
count&optional Port number, if included in URI.
- path:
string Full including the file name. Will be ‘/’ if there’s not path given.
- file_name:
string&optional Full file name, including extension, if there is a file name.
- file_base:
string&optional The base filename, without extension, if there is a file name.
- file_ext:
string&optional The filename’s extension, if there is a file name.
- params:
table[string] ofstring&optional A table of all query parameters, mapping their keys to values, if there’s a query.
- scheme:
A URI, as parsed by
decompose_uri.
Functions¶
-
find_all_urls¶ - Type
function(s:string) :string_set
Extracts URLs discovered in arbitrary text.
-
find_all_urls_without_scheme¶ - Type
function(s:string) :string_set
Extracts URLs discovered in arbitrary text without the URL scheme included.