base/utils/urls.zeek
Functions for URL handling.
Summary
Redefinable Options
A regular expression for matching and extracting URLs. |
Types
A URI, as parsed by |
Functions
Extracts URLs discovered in arbitrary text. |
|
Extracts URLs discovered in arbitrary text without the URL scheme included. |
Detailed Interface
Redefinable Options
- url_regex
- Type:
- Attributes:
- Default:
/^?(^([a-zA-Z\-]{3,5}):\/\/(-\.)?([^[:blank:]\/?\.#-]+\.?)+(\/[^[:blank:]]*)?)$?/
A regular expression for matching and extracting URLs. This is the @imme_emosol regex from https://mathiasbynens.be/demo/url-regex, adapted for Zeek. It’s not perfect for all of their test cases, but it’s one of the shorter ones that covers most of the test cases.
Types
- URI
- Type:
-
- scheme:
string
&optional
The URL’s scheme..
- netlocation:
string
The location, which could be a domain name or an IP address. Left empty if not specified.
- portnum:
count
&optional
Port number, if included in URI.
- path:
string
Full including the file name. Will be ‘/’ if there’s not path given.
- file_name:
string
&optional
Full file name, including extension, if there is a file name.
- file_base:
string
&optional
The base filename, without extension, if there is a file name.
- file_ext:
string
&optional
The filename’s extension, if there is a file name.
- params:
table
[string
] ofstring
&optional
A table of all query parameters, mapping their keys to values, if there’s a query.
- scheme:
A URI, as parsed by
decompose_uri
.
Functions
- find_all_urls
- Type:
function
(s:string
) :string_set
Extracts URLs discovered in arbitrary text.
- find_all_urls_without_scheme
- Type:
function
(s:string
) :string_set
Extracts URLs discovered in arbitrary text without the URL scheme included.