base/utils/urls.zeek

Functions for URL handling.

Summary

Redefinable Options

url_regex: pattern &redef

A regular expression for matching and extracting URLs.

Types

URI: record

A URI, as parsed by decompose_uri.

Functions

decompose_uri: function

find_all_urls: function

Extracts URLs discovered in arbitrary text.

find_all_urls_without_scheme: function

Extracts URLs discovered in arbitrary text without the URL scheme included.

Detailed Interface

Redefinable Options

url_regex
Type:

pattern

Attributes:

&redef

Default:
/^?(^([a-zA-Z\-]{3,5}):\/\/(-\.)?([^[:blank:]\/?\.#-]+\.?)+(\/[^[:blank:]]*)?)$?/

A regular expression for matching and extracting URLs. This is the @imme_emosol regex from https://mathiasbynens.be/demo/url-regex, adapted for Zeek. It’s not perfect for all of their test cases, but it’s one of the shorter ones that covers most of the test cases.

Types

URI
Type:

record

scheme: string &optional

The URL’s scheme..

netlocation: string

The location, which could be a domain name or an IP address. Left empty if not specified.

portnum: count &optional

Port number, if included in URI.

path: string

Full including the file name. Will be ‘/’ if there’s not path given.

file_name: string &optional

Full file name, including extension, if there is a file name.

file_base: string &optional

The base filename, without extension, if there is a file name.

file_ext: string &optional

The filename’s extension, if there is a file name.

params: table [string] of string &optional

A table of all query parameters, mapping their keys to values, if there’s a query.

A URI, as parsed by decompose_uri.

Functions

decompose_uri
Type:

function (uri: string) : URI

find_all_urls
Type:

function (s: string) : string_set

Extracts URLs discovered in arbitrary text.

find_all_urls_without_scheme
Type:

function (s: string) : string_set

Extracts URLs discovered in arbitrary text without the URL scheme included.