Monitoring With Zeek

Detection and Response Workflow

As noted in the previous sections, Zeek is optimized, more or less “out of the box,” to provide two of the four types of network security monitoring data. Without any major configuration, Zeek offers transaction data and extracted content data, in the form of logs summarizing protocols and files seen traversing the wire. Zeek can also provide some degree of alert data in the form of notices, and analysts can modify Zeek to create custom alerts if desired. A dedicated intrusion detection engine like Suricata or Snort might be more appropriate, however. Finally, Zeek does not collect full content data in pcap format, although other open source projects do provide that functionality.

Broadly speaking, incident detection and response begins with the collection of security data, followed by its analysis. In the analysis phase, in the absence of an explicit alert of malicious activity, investigators can work two broad investigative categories: “matching” and “hunting.” Matching means querying and reviewing security data for signs of known indicators of compromise. Hunting means working without indicators of compromise, relying instead on creating a hypothesis of how adversary activity might manifest in security data. Matching is the sort of activity that can be easily automated. Hunting is an activity that is difficult to automate because it relies upon the creation of a cyber security “experiment” to yield results and often a little bit of human intuition.

In the common vernacular, some security teams believe hunting involves querying data for indicators of compromise. That is really just a search function, i.e., looking for matches of “expected bad” in collected data. True hunting involves more of a scientific method that requires formulating a hypothesis, testing the hypothesis in sample and production data, and then refining the process until it yields results or is disproved. Investigative methods which yield results Zeek data plays a role in matching or hunting operations. Analysts may query a store of Zeek transaction logs for indicators of compromise, and begin a security investigation when they see a match on an IP address, or username, or HTTP user-agent string, or any single or combination of the hundreds of elements Zeek derives from network traffic. Analysts can also pose a hypothesis of how certain adversary behavior may appear in Zeek data, and then query that data for signs that prove or disprove their hypothesis.

Beyond the matching and hunting paradigms, analysts can use Zeek within an “incident detection alert” workflow. In this scenario, an IDS creates an alert that catches the attention of a security team member. Because IDS alerts are often light on details, analysts require corroborating data to decide if the alert represents normal, suspicious, or malicious activity. Analysts can “pivot” from the IDS alert to a variety of logs generated by Zeek. If the IDS alert provides the community identification (community ID) supported by Zeek, the analyst can easily tie the IDS alert to specific Zeek logs. Based on the data provided by Zeek, analysts may be able to resolve the incident. At the very least, the analyst can accelerate the alert validation and verification process by having access to data beyond the initial IDS notification.

Finally, analysts can use Zeek data to improve the validation process when prompted by any other external stimulus. For example, an analyst might notice an odd process running on a system, as reported by their endpoint detection and response (EDR) or anti-virus agent. Alternatively, an analyst might receive a report from a user or a peer involving suspicious activity on an Internet-facing Web server. In either case, the analyst with access to Zeek data can seek to learn all they can about the systems in question, simply by querying the repository storing their Zeek logs. This security design pattern has immense benefits, as it does not affect the end state of the suspicious asset. Not touching a system that may be compromised has two benefits. First, an intruder who has compromised the asset remains unawares that the security team is investigating it. Second, the forensic integrity of the asset remains intact, as the analyst is working with logs stored off-device.

Instrumentation and Collection

Zeek is designed to watch live network traffic. Although Zeek can process packet captures saved in PCAP format, most users deploy Zeek to gain near-real-time insights into network usage patterns. Administrators run Zeek by telling it to “sniff” one or more network interfaces, generating transaction logs, insights, and extracted file contents, based on the network traffic seen on those network interfaces.

Some users may choose to run Zeek on a single computer used for general computing purposes, watching network traffic to and from that single computer. That system might be an office laptop used for business purposes, chosen for experimentation with Zeek. This is a simple way to become familiar with the logs that Zeek creates. This approach is similar to running Tcpdump or Wireshark on one’s computer for the same educational purposes.

Most users, however, run Zeek on a computer selected solely for the purpose of network security monitoring. Security personnel call that computer a “sensor” and they select, configure, and deploy it specifically to watch network traffic. They select a location in an environment that offers visibility to multiple computers, and deploy the sensor with Zeek to instrument that network segment.

When choosing a place to deploy a sensor, users will likely prioritize a requirement like the following:

Identify a single location in the network to instrument with a network tap or switch span port that provides the maximum visibility. This means seeing traffic from all devices on the network, with a strong preference for identifying devices by observing them with their original source IP address.

Users new to Zeek may choose to try Zeek in their home or in a small office environment. Figure 1 depicts the standard SOHO network architecture. Letters A-D are possible monitoring locations, to be discussed below.

_images/collection-figure1.png — Figure 1: Standard SOHO Architecture

Most home users and many small office environments are connected to the Internet via customer premise equipment (CPE) provided by their Internet service provider (ISP). This box may or may not be available or visible to the customer. In the context of a system like Verizon FIOS, for example, the ISP CPE is the box attached to the outside of a residence, with a warning that only Verizon technicians should open it. For fiber connectivity, the ISP might call this device an Optical Network Terminal or ONT.

The ISP also provides a gateway device that provides routing and wireless access point (WAP) functionality. This is the piece of equipment familiar to most home and small office users. It typically has a gigabit copper Ethernet connection that connects to the ISP CPE, on its wide area network (WAN) side, and four gigabit copper Ethernet ports for devices on its local area network (LAN) side. Customer devices gain network access via WiFi to the ISP WAP or via copper Ethernet cables to the embedded switch on the same device.

On the WAN side of the router, the device usually has a public IP address provided by the ISP. This may not necessarily be the case, however. On the LAN side of the router, the device provides RFC 1918 private addresses, often in the 192.168.0.0/16 subnet. The router acts as a gateway, using network address translation (NAT), or for the more strictly minded, network port address translation (NPAT), so that client devices share a single IP address provided by the ISP. (Note that in some situations, multiple residences even share the same public IP address, and differentiate between each other via the port range. We’ll not consider this further for now, as it is extraneous to the discussion.)

Where does one monitor, given this architecture?

Location A is off limits to the customer. It is likely a cable exiting the ISP CPE and entering the ground.

Location B is a possibility, assuming the cable between the ISP CPE and router is a copper Ethernet cable. One could insert a reliable network tap (typically outside the home user’s budget) or a decent small managed switch with a span port (like a Netgear GS30Xe model).

However, and this is crucial: because of the NAT done by the router, all traffic will appear to originate from a single IP address. Whether the customer has 100 devices or 1 device, they will all share the single IP address. This reality makes it much more difficult for a security analyst to track down the originator of suspicious or malicious network traffic.

Location C is essentially not possible. Yes, there are various penetration testing tools and wireless network troubleshooting tools that can try to access WiFi traffic. However, they do not expose the traffic in a form usable to security analysts, assuming that the WiFi protocols in use are at all modern.

Location D is a possibility, assuming that the user installs a network tap or switch span port as in location B. However, monitoring only at location D would ignore WiFi traffic.

In other words, the standard SOHO network architecture is not well-suited for network security monitoring, because there isn’t a good place, by default, to see the originating IP addresses, which are generally needed to investigate suspicious and malicious activity.

In contrast, the Visible Network Architecture shown in Figure 2 depicts the sort of setup one needs if visibility is designed into the architecture, rather than added as an afterthought.

_images/collection-figure2.png — Figure 2: Visible Network Architecture

The major changes include the following:

The ISP router is no longer also acting as a WAP. The WiFi capability is disabled. No other changes are required on the router. Strictly speaking, WiFi need not be disabled, so long as no one uses it.

The customer has purchased her own router. That device may or may not also provide NAT.

The customer explicitly owns a switch, to which wired devices may connect. That switch has a span port.

The customer explicitly owns her own wireless access point, acting as a bridge, and not offering NAT.

Don’t be fooled into thinking that one need only buy a new combination router/WAP. It’s essential to split these functions. Consumer-grade customer routers do not offer span ports, which cheap consumer-grade network switches do. This architecture takes advantage of that fact in order to provide suitable monitoring locations.

Let’s review the options.

Location A is still off-limits.

Location B is still a bad idea.

Location C is a good option, if one places a network tap here, or another small switch with a span port, and neither the customer router nor customer WAP is doing NAT.

Location D is a better option. Now one need only ensure that the customer WAP is not doing NAT. In fact, one need not introduce another switch or tap here, assuming one can span the uplink port on the customer switch.

Location E would only see wired devices, and is not a good option because it ignores WiFi devices.

Location F would only see WiFi devices, and is not a good option because it ignores wired devices.

Location G is essentially impossible, as with Figure 1.

The bottom line is that the location D is the best monitoring location, assuming that the customer WAP is not doing NAT. If the customer WAP is acting as a router with NAT, then all of the wireless devices will have the same source IP address as seen in location D.

In an architecture designed for visibility, introducing a network tap, or simply spanning the uplink from the network switch, at point D, satisfies the visibility requirement.

It is possible to simplify the architecture shown in figure 2 to that which follows:

_images/collection-figure3.png — Figure 3: Simplified Visible Network Architecture

The customer router between monitoring points C and D is gone, as one can rely upon the ISP router if so desired.

In summary, one could deploy a Zeek sensor at location D, or C, if the simplified architecture is in place, as C and D are logically similar. Going forward, we’ll discuss monitoring at location D.

Gaining access to traffic at point D requires either a span port to be enabled on the customer switch, or a network tap to be deployed at location D. Professional Zeek users prefer high-quality, powered network taps wherever possible, for a variety of reasons. When they are not available, as in the case of a SOHO or test environment, then a span port on a managed switch is an acceptable alternative.

Once the network tap or span port is providing network traffic to the Zeek sensor, one can turn to matters beyond instrumentation and collection.

Storage and Review

As Zeek ingests network traffic, either by monitoring one or more live network interfaces or by processing stored traffic in a capture file, it creates a variety of logs and other artifacts. By default Zeek writes that data to a storage location designated via its configuration files. Zeek possesses the capability to write the logs in several formats and perform certain log management processes like compression and archiving.

Analysts make use of Zeek data by reviewing the logs it generates. Review methods can be as simple as using text processing tools packaged with the underlying operating system. Depending on the format of the logs, users may apply more specialized processing tools, some of which are available with Zeek. In many cases, Zeek administrators ship logs to specialized storage and review applications. These are usually referred to collectively as Security and Information Event Management (SIEM) platforms. Some of these log management and SIEM platforms are available as open source offerings, while others are commercially available.