base/utils/urls.zeek

Functions for URL handling.

Summary

Redefinable Options

url_regex: pattern &redef

A regular expression for matching and extracting URLs.

Types

URI: record

A URI, as parsed by decompose_uri.

Functions

decompose_uri: function

find_all_urls: function

Extracts URLs discovered in arbitrary text.

find_all_urls_without_scheme: function

Extracts URLs discovered in arbitrary text without the URL scheme included.

Detailed Interface

Redefinable Options

url_regex
Type

pattern

Attributes

&redef

Default
/^?(^([a-zA-Z\-]{3,5}):\/\/(-\.)?([^[:blank:]\/?\.#-]+\.?)+(\/[^[:blank:]]*)?)$?/

A regular expression for matching and extracting URLs. This is the @imme_emosol regex from https://mathiasbynens.be/demo/url-regex, adapted for Zeek. It’s not perfect for all of their test cases, but it’s one of the shorter ones that covers most of the test cases.

Types

URI
Type

record

scheme: string &optional

The URL’s scheme..

netlocation: string

The location, which could be a domain name or an IP address. Left empty if not specified.

portnum: count &optional

Port number, if included in URI.

path: string

Full including the file name. Will be ‘/’ if there’s not path given.

file_name: string &optional

Full file name, including extension, if there is a file name.

file_base: string &optional

The base filename, without extension, if there is a file name.

file_ext: string &optional

The filename’s extension, if there is a file name.

params: table [string] of string &optional

A table of all query parameters, mapping their keys to values, if there’s a query.

A URI, as parsed by decompose_uri.

Functions

decompose_uri
Type

function (uri: string) : URI

find_all_urls
Type

function (s: string) : string_set

Extracts URLs discovered in arbitrary text.

find_all_urls_without_scheme
Type

function (s: string) : string_set

Extracts URLs discovered in arbitrary text without the URL scheme included.