files.log

One of Zeek’s powerful features is the ability to extract content from network traffic and write it to disk as a file, via its File Analysis framework. This is easiest to understand with a protocol like File Transfer Protocol (FTP), a classic means to exchange files over a channel separate from that used to exchange commands. Protocols like HTTP are slightly more complicated, as it includes headers which must be interpreted and not included in any file content transferred by the protocol.

Zeek’s files.log is a record of files that Zeek observed while inspecting network traffic. The existence of an entry in files.log does not mean that Zeek necessarily extracted file content and wrote it to disk. Analysts must configure Zeek to extract files by type in order to have them written to disk.

In the following example, an analyst has configured Zeek to extract files of MIME type application/x-dosexec and write them to disk. To understand the chain of events that result in having a file on disk, we will start with the conn.log, progress to the http.log, and conclude with the files.log.

The Zeek scripting manual, derived from the Zeek source code, completely explains the meaning of each field in the files.log (and other logs). It would be duplicative to manually recreate that information in another format here. Therefore, this entry seeks to show how an analyst would make use of the information in the files.log. Those interested in getting details on every element of the files.log should refer to Files::Info.

Throughout the sections that follow, we will inspect Zeek logs in JSON format. As we have shown how to access logs like this previously using the command line, we will only show the log entries themselves.

Inspecting the conn.log

The log with which we begin our analysis for this case is the conn.log. It contains the following entry of interest.

{
  "ts": 1596820191.94147,
  "uid": "CzoFRWTQ6YIzfFXHk",
  "id.orig_h": "192.168.4.37",
  "id.orig_p": 58264,
  "id.resp_h": "23.195.64.241",
  "id.resp_p": 80,
  "proto": "tcp",
  "service": "http",
  "duration": 0.050640106201171875,
  "orig_bytes": 211,
  "resp_bytes": 179604,
  "conn_state": "SF",
  "missed_bytes": 0,
  "history": "ShADadtFf",
  "orig_pkts": 93,
  "orig_ip_bytes": 5091,
  "resp_pkts": 129,
  "resp_ip_bytes": 186320
}

We see that 192.168.4.37 contacted 23.195.64.241 via HTTP and connected to port 80 TCP. The responder sent 179604 bytes of data during the conversation.

Because this conversation appears to have taken place using HTTP, a clear text protocol, there is a good chance that we can directly inspect the HTTP headers and the payloads that were exchanged.

We will use the UID, CzoFRWTQ6YIzfFXHk, to find corresponding entries in other log sources to better understand what happened during this conversation.

Inspecting the http.log

We search our http.log files for samples containing the UID of interest and find the following entry:

{
  "ts": 1596820191.94812,
  "uid": "CzoFRWTQ6YIzfFXHk",
  "id.orig_h": "192.168.4.37",
  "id.orig_p": 58264,
  "id.resp_h": "23.195.64.241",
  "id.resp_p": 80,
  "trans_depth": 1,
  "method": "GET",
  "host": "download.microsoft.com",
  "uri": "/download/d/e/5/de5351d6-4463-4cc3-a27c-3e2274263c43/wfetch.exe",
  "version": "1.1",
  "user_agent": "Wget/1.19.4 (linux-gnu)",
  "request_body_len": 0,
  "response_body_len": 179272,
  "status_code": 200,
  "status_msg": "OK",
  "tags": [],
  "resp_fuids": [
    "FBbQxG1GXLXgmWhbk9"
  ],
  "resp_mime_types": [
    "application/x-dosexec"
  ]
}

The most interesting elements of this log entry include the following:

"method": "GET",
"host": "download.microsoft.com",
"uri": "/download/d/e/5/de5351d6-4463-4cc3-a27c-3e2274263c43/wfetch.exe",

This shows us what file the client was trying to retrieve, wfetch.exe, from what site, download.microsoft.com.

The following element shows us the client that made the request:

"user_agent": "Wget/1.19.4 (linux-gnu)",

According to this log entry, the user agent was not a Microsoft product, but was a Linux version of the wget utility. User agent fields can be manipulated, so we cannot trust that this was exactly what happened. It is probable however that wget was used in this case.

The following entry shows us that the Web server responding positively to the request:

"status_code": 200,
"status_msg": "OK",

Based on this entry and the amount of bytes transferred, it is likely that the client received the file it requested.

The final two entries of interest tell us something more about the content that was transferred and how to locate it:

"resp_fuids": [
  "FBbQxG1GXLXgmWhbk9"
],
"resp_mime_types": [
  "application/x-dosexec"

The first entry provides a file identifier. This is similar to the connection identifier in the conn.log, except that we use the file identifier to locate specific file contents when written to disk.

The second entry shows that Zeek recognized the file content as application/x-dosexec, which likely means that the client retrieved a Windows executable file.

Inspecting the files.log

Armed with the file identifier value, we can search any of our files.log repositories for matching values. By searching for the FUID of FBbQxG1GXLXgmWhbk9 we find the following entry.

{
  "ts": 1596820191.969902,
  "fuid": "FBbQxG1GXLXgmWhbk9",
  "uid": "CzoFRWTQ6YIzfFXHk",
  "id.orig_h": "192.168.4.37",
  "id.orig_p": 58264,
  "id.resp_h": "23.195.64.241",
  "id.resp_p": 80,
  "source": "HTTP",
  "depth": 0,
  "analyzers": [
    "EXTRACT",
    "PE"
  ],
  "mime_type": "application/x-dosexec",
  "duration": 0.015498876571655273,
  "is_orig": false,
  "seen_bytes": 179272,
  "total_bytes": 179272,
  "missing_bytes": 0,
  "overflow_bytes": 0,
  "timedout": false,
  "extracted": "HTTP-FBbQxG1GXLXgmWhbk9.exe",
  "extracted_cutoff": false
}

Note that this files.log entry also contains the UID we found in the conn.log, e.g., CzoFRWTQ6YIzfFXHk. Theoretically we could have just searched for that UID value and not bothered to locate the FUID in the http.log. However, I find that it makes sense to follow this sort of progression, as we cannot rely on this same analytical workflow for all cases.

In this files.log data, we see that the EXTRACT and PE analyzer events were activated. Zeek saw 179272 bytes transferred and does not appear to have missed any bytes. Zeek extracted the file it saw as HTTP-FBbQxG1GXLXgmWhbk9.exe, which means we should be able to locate that file on disk.

The is_orig field in a files.log entry can be used to determine which endpoint sent the file. When is_orig is false, the responder of the connection is sending the file. In the example above we can tell that the HTTP server at 23.195.64.241 is sending the file and 192.168.4.37 is receiving it.

Inspecting the Extracted File

The location for extracted files will vary depending on your Zeek configuration. In my example, Zeek wrote extracted files to a directory called extract_files/. Here is the file in question:

$ ls -al HTTP-FBbQxG1GXLXgmWhbk9.exe
-rw-rw-r-- 1 zeek zeek 179272 Aug  7 17:23 HTTP-FBbQxG1GXLXgmWhbk9.exe

Note the byte count, 179272, matches the value in the files.log.

Here is what the Linux file command thinks of this file.

$ file HTTP-FBbQxG1GXLXgmWhbk9.exe
HTTP-FBbQxG1GXLXgmWhbk9.exe: PE32 executable (GUI) Intel 80386, for MS Windows, MS CAB-Installer self-extracting archive

This looks like a Windows executable. You can use the md5sum utility to generate a MD5 hash of the file.

$ md5sum HTTP-FBbQxG1GXLXgmWhbk9.exe
6711727adf76599bf50c9426057a35fe  HTTP-FBbQxG1GXLXgmWhbk9.exe

We can search by the hash value on VirusTotal using the vt command line tool, provided we have registered and initialized vt with our free API key.

$ ./vt file 6711727adf76599bf50c9426057a35fe
- _id: "82f39086658ce80df4da6a49fef9d3062a00fd5795a4dd5042de32907bcb5b89"
  _type: "file"
  authentihash: "2a07d356273d32bf0c5aff83ea847351128fc3971b44052f92b6fb4f45c2272f"
  creation_date: 1030609542  # 2002-08-29 08:25:42 +0000 UTC
  first_submission_date: 1354191312  # 2012-11-29 12:15:12 +0000 UTC
  last_analysis_date: 1592215708  # 2020-06-15 10:08:28 +0000 UTC
  last_analysis_results:
    ALYac:
      category: "undetected"
      engine_name: "ALYac"
      engine_update: "20200615"
      engine_version: "1.1.1.5"
      method: "blacklist"
...edited…
 last_analysis_stats:
    confirmed-timeout: 0
    failure: 0
    harmless: 0
    malicious: 0
    suspicious: 0
    timeout: 0
    type-unsupported: 2
    undetected: 74
  last_modification_date: 1592220693  # 2020-06-15 11:31:33 +0000 UTC
  last_submission_date: 1539056691  # 2018-10-09 03:44:51 +0000 UTC
  magic: "PE32 executable for MS Windows (GUI) Intel 80386 32-bit"
  md5: "6711727adf76599bf50c9426057a35fe"
  meaningful_name: "WEXTRACT.EXE"
  names:
  - "Wextract"
  - "WEXTRACT.EXE"
  - "wfetch.exe"
  - "583526"
  packers:
    F-PROT: "CAB, ZIP"
    PEiD: "Microsoft Visual C++ v6.0 SPx"
  pe_info:
    entry_point: 23268
    imphash: "1494de9b53e05fc1f40cb92afbdd6ce4"
    import_list:
    - imported_functions:
      - "GetLastError"
      - "IsDBCSLeadByte"
      - "DosDateTimeToFileTime"
      - "ReadFile"
      - "GetStartupInfoA"
      - "GetSystemInfo"
      - "lstrlenA"
...edited...
 size: 179272
  ssdeep: "3072:BydJq5oyVzs+h0Jk5irDStDD5QOsP0CLRQq8ZZ3xlf/AQnFlFuKIUaKJH:UW2+AiDWOsPxQq8HHf/A07namH"
  tags:
  - "invalid-signature"
  - "peexe"
  - "signed"
  - "overlay"
  times_submitted: 33
  total_votes:
    harmless: 1
    malicious: 0
  trid:
  - file_type: "Microsoft Update - Self Extracting Cabinet"
    probability: 46.3
  - file_type: "Win32 MS Cabinet Self-Extractor (WExtract stub)"
    probability: 41.4
  - file_type: "Win32 Executable MS Visual C++ (generic)"
    probability: 4.2
  - file_type: "Win64 Executable (generic)"
    probability: 3.7
  - file_type: "Win16 NE executable (generic)"
    probability: 1.9
  type_description: "Win32 EXE"
  type_tag: "peexe"
  unique_sources: 24
  vhash: "  size: 179272
  ssdeep: "3072:BydJq5oyVzs+h0Jk5irDStDD5QOsP0CLRQq8ZZ3xlf/AQnFlFuKIUaKJH:UW2+AiDWOsPxQq8HHf/A07namH"
  tags:
  - "invalid-signature"
  - "peexe"
  - "signed"
  - "overlay"
  times_submitted: 33
  total_votes:
    harmless: 1
    malicious: 0
  trid:
  - file_type: "Microsoft Update - Self Extracting Cabinet"
    probability: 46.3
  - file_type: "Win32 MS Cabinet Self-Extractor (WExtract stub)"
    probability: 41.4
  - file_type: "Win32 Executable MS Visual C++ (generic)"
    probability: 4.2
  - file_type: "Win64 Executable (generic)"
    probability: 3.7
  - file_type: "Win16 NE executable (generic)"
    probability: 1.9
  type_description: "Win32 EXE"
  type_tag: "peexe"
  unique_sources: 24
  vhash: "0150366d1570e013z1004cmz1f03dz"

You can access the entire report via the Web here.

It appears this is a harmless Windows executable. However, by virtue of having it extracted from network traffic, analysts have many options for investigation when the file is not considered benign.

Conclusion

Zeek’s file extraction capabilities offer many advantages to analysts. Administrators can configure Zeek to compute MD5 hashes of files that Zeek sees in network traffic. Rather than computing a hash on a file written to disk, Zeek could simply compute the hash as part of its inspection process. The purpose of this document was to show some of the data in the files.log, how it relates to other Zeek logs, and how analysts might make use of it.