base/utils/urls.zeek¶
Functions for URL handling.
Summary¶
Redefinable Options¶
A regular expression for matching and extracting URLs. |
Types¶
A URI, as parsed by |
Functions¶
Extracts URLs discovered in arbitrary text. |
|
Extracts URLs discovered in arbitrary text without the URL scheme included. |
Detailed Interface¶
Redefinable Options¶
-
url_regex
¶ - Type
- Attributes
- Default
/^?(^([a-zA-Z\-]{3,5}):\/\/(-\.)?([^[:blank:]\/?\.#-]+\.?)+(\/[^[:blank:]]*)?)$?/
A regular expression for matching and extracting URLs. This is the @imme_emosol regex from https://mathiasbynens.be/demo/url-regex, adapted for Zeek. It’s not perfect for all of their test cases, but it’s one of the shorter ones that covers most of the test cases.
Types¶
-
URI
¶ - Type
-
- scheme:
string
&optional
The URL’s scheme..
- netlocation:
string
The location, which could be a domain name or an IP address. Left empty if not specified.
- portnum:
count
&optional
Port number, if included in URI.
- path:
string
Full including the file name. Will be ‘/’ if there’s not path given.
- file_name:
string
&optional
Full file name, including extension, if there is a file name.
- file_base:
string
&optional
The base filename, without extension, if there is a file name.
- file_ext:
string
&optional
The filename’s extension, if there is a file name.
- params:
table
[string
] ofstring
&optional
A table of all query parameters, mapping their keys to values, if there’s a query.
- scheme:
A URI, as parsed by
decompose_uri
.
Functions¶
-
find_all_urls
¶ - Type
function
(s:string
) :string_set
Extracts URLs discovered in arbitrary text.
-
find_all_urls_without_scheme
¶ - Type
function
(s:string
) :string_set
Extracts URLs discovered in arbitrary text without the URL scheme included.