Chapter 2.3 - Network Forensics & Log Analysis

Module 2: Traffic Analysis & Intrusion Detection Prerequisites: Chapter 2.1 (Packet Analysis), Chapter 2.2 (IDS/IPS)

What Network Forensics Actually Answers
Evidence Sources & Collection Hierarchy
Full Packet Capture: Architecture & Storage
NetFlow / IPFIX: Session-Level Telemetry
Zeek: The Network Observation Framework
Log Sources: Firewall, DNS, Proxy, Auth
Log Correlation & Timeline Reconstruction
Identifying Attack Patterns in Logs
Chain of Custody & Forensic Integrity
MITRE ATT&CK Mapping

1. What Network Forensics Actually Answers

Network forensics is the capture, preservation, and analysis of network traffic and associated logs to reconstruct what happened, when, how, and from where. Unlike live intrusion detection - which fires on in-progress events - network forensics operates retroactively on stored evidence to answer investigative questions:

What was exfiltrated? Reconstruct transferred files from packet captures.
How did the attacker enter? Identify the initial access vector from HTTP/SMTP/VPN logs.
Which systems were touched? Map lateral movement from authentication logs and NetFlow.
When did the intrusion begin? Establish a precise timeline from correlated log timestamps.
Is the attacker still present? Detect ongoing C2 beaconing in flow data.

These questions have different evidence requirements. File reconstruction requires full packet capture (FPC). Lateral movement mapping needs authentication event logs and NetFlow. Timeline precision requires all sources to be time-synchronized - an investigator who ignores NTP drift will build timelines with phantom gaps and overlaps.

The evidence quality hierarchy matters: FPC is the gold standard (every byte is preserved), but it's expensive at scale. NetFlow is cheap and scalable but loses payload. Logs are high-signal but depend entirely on what the generating application chose to record.

2. Evidence Sources & Collection Hierarchy

Storage and Retention Estimates

Source	Volume at 1 Gbps sustained	Typical Retention
Full packet capture	~80-100 GB/hr (after compression)	24-72 hours (rolling)
NetFlow (sampled 1:100)	~50-200 MB/hr	30-90 days
Zeek logs	~1-3 GB/hr	30-90 days
Firewall deny logs	~100-500 MB/hr	90-365 days
DNS query logs	~50-200 MB/hr	90-365 days
Proxy logs	~500 MB-2 GB/hr	90-365 days

The practical implication: FPC is a short window that catches active incidents; NetFlow and application logs are the primary evidence sources for investigating incidents discovered days or weeks after the fact.

3. Full Packet Capture: Architecture & Storage

Capture at Scale

Single-interface tcpdump fails above a few hundred Mbps due to kernel copy overhead and lock contention. Production FPC uses:

AF_PACKET with PACKET_MMAP - kernel ring buffer, zero-copy to userspace
PF_RING / DPDK - kernel bypass; handles 10-100 Gbps line rates
Dedicated appliances - Endace, cPacket, ExtraHop hardware with lossless capture guarantees

For lab and incident response work, tcpdump and tshark are adequate:

# Capture to rotating pcap files: 1 GB per file, keep last 50 files
tcpdump \
  -i eth0 \                          # capture interface
  -s 0 \                             # full snaplen (0 = entire packet, no truncation)
  -w /data/capture/capture-%Y%m%d-%H%M%S.pcap \
  -G 3600 \                          # rotate every 3600 seconds
  -C 1000 \                          # also rotate at 1000 MB
  -W 50 \                            # keep at most 50 files (ring buffer)
  -z gzip \                          # compress rotated files
  not port 22                        # exclude SSH (your own management traffic)

# High-performance capture with PF_RING zero-copy
tcpdump \
  --immediate-mode \
  -i zc:eth0 \                       # zc: prefix = zero-copy PF_RING interface
  -s 0 \
  -w /nvme/capture/live.pcap

Targeted Capture During Incident Response

When you know the suspect host or connection, limit capture scope to reduce storage and analysis noise:

# Capture all traffic to/from a specific host
tcpdump -i eth0 -s 0 -w suspect_host.pcap \
  host 192.168.1.55

# Capture only C2-indicative ports from a suspect host
tcpdump -i eth0 -s 0 -w suspect_c2.pcap \
  "host 192.168.1.55 and (port 443 or port 4444 or port 8080)"

# Capture DNS queries + responses for a specific domain
tcpdump -i eth0 -s 0 -w dns_evil.pcap \
  "port 53 and (udp or tcp)"

# Capture SMB traffic for lateral movement analysis
tcpdump -i eth0 -s 0 -w smb_lateral.pcap \
  "port 445 or port 139"

File Carving from PCAP

Network forensics frequently requires reconstructing files transferred over the network - malware payloads, exfiltrated documents, C2 implants.

# Carve HTTP-transferred files from pcap using NetworkMiner (CLI mode)
mono /opt/NetworkMiner/NetworkMiner.exe \
  --capture-file suspicious.pcap \
  --output-dir /tmp/carved_files

# Carve with tcpflow: reassembles TCP streams, writes each stream to a file
tcpflow \
  -r capture.pcap \                  # read from pcap
  -o /tmp/tcpflow_output \           # output directory
  -ae                                # -a: report all packets; -e: enable regex named capture

# Carve HTTP objects with tshark (Wireshark CLI)
tshark \
  -r capture.pcap \
  --export-objects http,/tmp/http_objects  # writes each HTTP object as a separate file

# Carve SMB transferred files
tshark \
  -r capture.pcap \
  --export-objects smb,/tmp/smb_objects

# Extract and hash all carved files for IOC matching
find /tmp/http_objects -type f \
  | xargs -I{} sh -c 'md5sum {}; sha256sum {}'

Reconstructing Sessions for Analyst Review

# Follow a specific TCP stream in tshark (stream index from initial analysis)
tshark -r capture.pcap \
  -z follow,tcp,ascii,0 \            # stream index 0; ascii mode (readable text)
  -q                                 # suppress default packet output

# Follow HTTP conversation
tshark -r capture.pcap \
  -Y "http" \
  -T fields \
  -e frame.time \
  -e ip.src \
  -e http.request.method \
  -e http.request.uri \
  -e http.response.code \
  -e http.content_length \
  -E header=y \
  -E separator=,

# Reconstruct a full HTTP response body (e.g., downloaded malware)
tshark -r capture.pcap \
  -Y "tcp.stream eq 5" \             # isolate stream 5
  -T fields -e data \                # raw hex data
  | tr -d '\n' \
  | xxd -r -p > /tmp/stream5_payload.bin
file /tmp/stream5_payload.bin        # identify file type

4. NetFlow / IPFIX: Session-Level Telemetry

NetFlow (Cisco), IPFIX (IETF standard, RFC 7011), and sFlow (sampled) all describe network sessions without preserving payload. A flow record captures the 5-tuple (src IP, dst IP, src port, dst port, protocol) plus counters (bytes, packets, duration, TCP flags) and metadata (AS numbers, DSCP, input/output interface).

What You Can and Cannot Determine from Flows

Investigative question	NetFlow answer
Did host A talk to host B?	Yes - precise src/dst/port/time
How much data was transferred?	Yes - byte/packet counters per direction
What protocol was used?	Yes - port + protocol field (L4 only)
Was it encrypted?	Yes - inferred (TLS = port 443 typically)
What file was downloaded?	No - no payload, no filename
What SQL query was run?	No - no application-layer content
Was the session successful?	Partial - TCP flags indicate SYN-ACK handshake, but no app-layer response code
Was data exfiltrated?	Partial - inferred from large upload byte counts

Collecting and Querying Flows

# nfdump: analyze NetFlow/IPFIX data stored by nfcapd
# Show top talkers by bytes in the last 6 hours
nfdump \
  -R /var/cache/nfdump/2024/11/15/ \
  -t 2024-11-15.08:00:00-2024-11-15.14:00:00 \
  -s srcip/bytes \                     # sort: top source IPs by byte count
  -n 20 \                              # show top 20
  -o long

# Find all connections from a specific host
nfdump \
  -R /var/cache/nfdump/ \
  -t 2024-11-15.00:00:00-2024-11-16.00:00:00 \
  'src ip 192.168.1.55' \
  -o long

# Detect large data transfers (potential exfiltration) -- uploads > 50 MB
nfdump \
  -R /var/cache/nfdump/ \
  'src ip in [10.0.0.0/8] and bytes > 52428800 and not dst ip in [10.0.0.0/8]' \
  -o long \
  -s dstip/bytes

# Identify beaconing: connections from one host to same external IP at regular intervals
nfdump \
  -R /var/cache/nfdump/ \
  'src ip 192.168.1.55 and dst ip 203.0.113.44' \
  -o "fmt:%ts %td %byt %pkt" \
  -q

Beaconing Detection with Python

Automated beaconing detection operates on flow duration/timing data:

#!/usr/bin/env python3
"""
beacon_detect.py - detect periodic C2 beaconing from NetFlow CSV export
Usage: nfdump -R /data/flows/ -o csv | python3 beacon_detect.py
"""
import sys
import statistics
from collections import defaultdict
from datetime import datetime

flows = defaultdict(list)

for line in sys.stdin:
    line = line.strip()
    if not line or line.startswith('#') or line.startswith('Date'):
        continue
    parts = line.split(',')
    if len(parts) < 7:
        continue
    try:
        ts_str = parts[0]                    # timestamp field
        src_ip = parts[3]                    # source IP
        dst_ip = parts[5]                    # destination IP
        dst_port = parts[6]                  # destination port
        ts = datetime.strptime(ts_str, '%Y-%m-%d %H:%M:%S')
        key = (src_ip, dst_ip, dst_port)
        flows[key].append(ts.timestamp())   # store unix timestamp
    except (ValueError, IndexError):
        continue

# Analyze each src->dst:port pair for periodic behavior
print(f"{'Source':<20} {'Destination':<20} {'Port':<8} {'Count':>6} {'AvgInterval':>14} {'Jitter(CV)':>12}")
print("-" * 85)

for (src, dst, port), timestamps in flows.items():
    if len(timestamps) < 5:              # need at least 5 samples for meaningful analysis
        continue
    timestamps.sort()
    intervals = [timestamps[i+1] - timestamps[i] for i in range(len(timestamps)-1)]
    avg_interval = statistics.mean(intervals)
    if avg_interval < 10:               # ignore sub-10-second intervals (too noisy)
        continue
    stdev = statistics.stdev(intervals) if len(intervals) > 1 else 0
    cv = stdev / avg_interval           # coefficient of variation; low CV = regular beaconing

    if cv < 0.3 and len(timestamps) >= 10:  # beacon threshold: CV < 0.3, 10+ samples
        print(f"{src:<20} {dst:<20} {port:<8} {len(timestamps):>6} "
              f"{avg_interval:>12.1f}s {cv:>12.4f}  <- BEACON CANDIDATE")
    elif cv < 0.5:
        print(f"{src:<20} {dst:<20} {port:<8} {len(timestamps):>6} "
              f"{avg_interval:>12.1f}s {cv:>12.4f}")

5. Zeek: The Network Observation Framework

Zeek (formerly Bro) occupies the space between full packet capture and raw NetFlow - it performs deep protocol analysis and emits structured, query-friendly logs without storing raw packets. For network forensics, Zeek logs are often the most valuable artifact: they contain application-layer semantics (HTTP URIs, DNS query names, TLS certificates, file hashes) at a fraction of the storage cost of PCAP.

Core Log Files

Log file	Content	Key forensic fields
`conn.log`	All TCP/UDP/ICMP sessions	`id.orig_h`, `id.resp_h`, `service`, `orig_bytes`, `resp_bytes`, `conn_state`, `history`
`dns.log`	DNS queries and responses	`query`, `qtype_name`, `answers`, `TTL`
`http.log`	HTTP requests	`method`, `host`, `uri`, `referrer`, `user_agent`, `status_code`, `resp_body_len`
`ssl.log`	TLS sessions	`server_name` (SNI), `subject`, `issuer`, `validation_status`, `ja3`, `ja3s`
`files.log`	File transfers (any protocol)	`source`, `filename`, `mime_type`, `md5`, `sha1`, `sha256`
`x509.log`	TLS certificates	`certificate.subject`, `certificate.issuer`, `certificate.not_valid_after`, `san.dns`
`smtp.log`	Email sessions	`from`, `to`, `subject`, `path`, `user_agent`
`notice.log`	Zeek policy alerts	`note`, `msg`, `src`, `dst`, `sub`
`weird.log`	Protocol anomalies	`name`, `msg` - malformed packets, protocol violations

Zeek CLI: Offline Analysis

# Process a pcap offline, write all logs to current directory
zeek -r suspicious_traffic.pcap \
  local \                            # load default policy scripts
  LogAscii::use_json=T               # output logs as JSON instead of TSV

# Enable specific detection scripts
zeek -r capture.pcap \
  /opt/zeek/share/zeek/policy/protocols/ssl/validate-certs.zeek \
  /opt/zeek/share/zeek/policy/protocols/dns/detect-external-names.zeek \
  /opt/zeek/share/zeek/policy/frameworks/intel/do_notice.zeek \
  Intel::read_files=/etc/zeek/intel/indicators.dat  # load IOC feed

# Live capture mode (production deployment)
zeekctl deploy    # deploy using ZeekControl (manages workers, logger, manager)
zeekctl status    # check node health
zeekctl stop

Querying Zeek Logs with zeek-cut and jq

Zeek's TSV logs are fast to query with zeek-cut (column extractor); JSON logs use jq:

# Extract all unique external destinations and byte counts from conn.log
cat conn.log \
  | zeek-cut id.orig_h id.resp_h orig_bytes resp_bytes conn_state \
  | awk '$5 == "SF"' \               # SF = normal established+closed connection
  | sort -k3 -rn \                   # sort by orig_bytes descending
  | head -20

# Find all DNS queries to a suspicious TLD
cat dns.log \
  | zeek-cut ts id.orig_h query answers \
  | grep -E '\.(ru|cn|tk|pw|xyz)$' \
  | sort -u

# Find HTTP requests with suspicious user agents (or missing user agent)
cat http.log \
  | zeek-cut ts id.orig_h host uri user_agent status_code \
  | awk -F'\t' '$5 == "-" || $5 ~ /python|curl|wget|go-http/' \
  | sort

# Find all files transferred with their hashes (requires file analysis enabled)
cat files.log \
  | zeek-cut ts source fuid mime_type filename md5 sha256 \
  | grep -v '-' \                    # filter out entries with missing hashes
  | sort -k4                         # sort by mime type

# Cross-reference file hashes against VirusTotal (manual -- requires vt-cli)
cat files.log \
  | zeek-cut sha256 \
  | grep -v '^-$' \
  | sort -u \
  | xargs -I{} vt file {}

# Identify TLS connections with self-signed certificates
cat ssl.log \
  | zeek-cut ts id.orig_h id.resp_h server_name validation_status \
  | awk '$5 == "self signed certificate in certificate chain"' \
  | sort -k3                         # group by destination IP

Writing a Custom Zeek Script

Zeek's scripting language enables custom detection logic that operates on protocol events:

# /etc/zeek/site/detect_dns_tunneling.zeek
# Fires a notice when a DNS query name exceeds 60 chars or contains base64-like patterns

@load base/frameworks/notice

module DNSTunnel;

export {
    redef enum Notice::Type += {
        DNS_Long_Query,
        DNS_HighEntropy_Query
    };
}

event dns_request(c: connection, msg: dns_msg, query: string, qtype: count, qclass: count)
    {
    # Long query check
    if ( |query| > 60 )
        {
        NOTICE([$note=DNS_Long_Query,
                $conn=c,
                $msg=fmt("Long DNS query: %s (%d chars)", query, |query|),
                $identifier=cat(c$id$orig_h, query)]);
        }

    # Base64-like pattern check: long runs of alphanumeric before first dot
    local label = split_string(query, /\./)[0];
    if ( |label| > 30 && /^[A-Za-z0-9+\/]{30,}/ in label )
        {
        NOTICE([$note=DNS_HighEntropy_Query,
                $conn=c,
                $msg=fmt("Possible base64-encoded DNS label: %s", label),
                $identifier=cat(c$id$orig_h, label)]);
        }
    }

6. Log Sources: Firewall, DNS, Proxy, Auth

Firewall Logs

Firewall logs record allow/deny decisions with the 5-tuple plus timestamp, interface, rule name, and NAT translation state. What they reveal and miss:

Reveal: blocked connection attempts (reconnaissance), permitted connections (communication paths), NAT mappings (internal host behind translated address)
Miss: application-layer content, encrypted payload, traffic permitted by overly broad rules

# Parse iptables/nftables log from syslog
# iptables must be configured with -j LOG rule before DROP/ACCEPT
grep "IN=eth0" /var/log/kern.log \
  | grep -E "DPT=(4444|8080|1337)" \  # suspicious destination ports
  | awk '{
      for(i=1;i<=NF;i++){
          if($i~/^SRC=/)    printf "src=%s ", substr($i,5);
          if($i~/^DST=/)    printf "dst=%s ", substr($i,5);
          if($i~/^DPT=/)    printf "dport=%s ", substr($i,5);
          if($i~/^SPT=/)    printf "sport=%s\n", substr($i,5);
      }
  }'

# pfSense/OPNsense: filterlog CSV format
# Fields: rule,sub,anchor,tracker,interface,reason,action,direction,ip_version,...
grep ",block," /var/log/filter.log \
  | awk -F',' '{print $11, $13, $14, $16}' \
  | sort | uniq -c | sort -rn

# Cisco ASA: parse deny logs
grep "Deny" /var/log/asa.log \
  | grep -oP "src\s+\K[\d.]+" \       # extract source IPs
  | sort | uniq -c | sort -rn \
  | head -20

DNS Logs

DNS is a forensic goldmine. Every hostname resolution is recorded - including C2 domain lookups, DGA beacons, and data exfiltration channels. DNS logs correlate internal host activity to external destinations even when the actual connection is encrypted.

# BIND (named) query log -- enable in named.conf:
# logging { channel query_log { file "/var/log/named/query.log"; severity info; };
#           category queries { query_log; }; };

# Extract all unique domains queried by a specific host
grep "192.168.1.55" /var/log/named/query.log \
  | grep -oP "query: \K\S+" \         # extract domain name
  | sort | uniq -c | sort -rn

# Find NX (NXDOMAIN) responses -- hosts querying non-existent domains
# High NX rate from one host = DGA or DNS tunneling
grep "NXDOMAIN" /var/log/named/query.log \
  | awk '{print $5}' \                # client IP field
  | sort | uniq -c | sort -rn

# Windows DNS debug log (enable via dnsmgmt.msc -> server properties -> debug logging)
Get-Content C:\Windows\System32\dns\dns.log |
  Where-Object { $_ -match "QUERY" } |
  Select-String -Pattern "[A-Za-z0-9]{20,}\.(com|net|org)" |
  Sort-Object | Get-Unique

# Passive DNS -- using dnstap (binary DNS capture format, higher fidelity than text logs)
fstrm_capture \
  -t protobuf:dnstap.Dnstap \
  -u /var/run/named/dnstap.sock \
  -w /var/log/dnstap/dns.fstrm

# Decode dnstap frames
dnstap-read /var/log/dnstap/dns.fstrm \
  | grep -i "evil\.com"

Proxy Logs

HTTP/HTTPS proxies (Squid, Bluecoat, Zscaler) log every web request with full URL, response code, bytes transferred, and client identity. In organizations that route all web traffic through a proxy, proxy logs are often the single richest source of evidence for web-based attacks.

# Squid access.log format:
# timestamp  duration  client_ip  result/status  bytes  method  url  hierarchy  mime_type

# Find all URLs accessed by a suspect host with large response sizes
awk '$3 == "192.168.1.55" && $5 > 1048576' /var/log/squid/access.log \
  | awk '{print $3, $5, $7}' \         # client, bytes, URL
  | sort -k2 -rn

# Find requests to newly registered domains (heuristic: short second-level domain)
awk '{print $7}' /var/log/squid/access.log \
  | grep -oP 'https?://\K[^/]+' \      # extract hostname
  | sort | uniq -c | sort -rn \
  | awk '$2 ~ /^[a-z0-9]{4,8}\.(com|net|cc|pw|xyz)$/'

# Find POST requests (data being sent out) with large bodies
awk '$6 == "POST" && $5 > 10000' /var/log/squid/access.log \
  | awk '{print $3, $7, $5}'

# Zscaler / cloud proxy: NSS log forwarded via syslog -- parse JSON
cat /var/log/zscaler/nss.log \
  | jq 'select(.action == "Allowed") | select(.bytes_sent > 100000) | {user, url, bytes_sent}'

Authentication Logs

Authentication events tell you who authenticated from where and when - essential for tracking lateral movement (T1021) and credential abuse.

# Linux: failed SSH logins (brute force detection)
grep "Failed password" /var/log/auth.log \
  | awk '{print $(NF-3)}' \            # extract source IP
  | sort | uniq -c | sort -rn \
  | head -20

# Successful logins after previous failures from the same IP (successful brute force)
awk '/Failed password/{fail[$NF]++} /Accepted password/{if(fail[$NF]>5) print "BRUTE SUCCESS:", $NF, fail[$NF], "failures"}' \
  /var/log/auth.log

# sudo usage log -- privilege escalation after compromise
grep "sudo:" /var/log/auth.log \
  | grep "COMMAND" \
  | awk '{print $1, $2, $3, $5, $NF}'

# Windows Security Event Log -- use Get-WinEvent (PowerShell) or evtx_dump
# Event IDs:
# 4624 = Successful logon
# 4625 = Failed logon
# 4648 = Logon with explicit credentials (pass-the-hash indicator)
# 4768 = Kerberos TGT requested
# 4769 = Kerberos service ticket requested
# 4776 = NTLM authentication
# 4771 = Kerberos pre-auth failed (Kerberoasting indicator)

# PowerShell: find all 4624 events with logon type 3 (network) in the last 24 hours
Get-WinEvent -FilterHashtable @{
    LogName   = 'Security'
    Id        = 4624
    StartTime = (Get-Date).AddHours(-24)
} | Where-Object {
    $_.Properties[8].Value -eq 3     # LogonType 3 = network logon
} | Select-Object TimeCreated,
    @{N='User';   E={$_.Properties[5].Value}},
    @{N='SrcIP';  E={$_.Properties[18].Value}},
    @{N='WorkStation'; E={$_.Properties[11].Value}} |
  Format-Table -AutoSize

# Rapid logon across multiple systems = lateral movement
# Python: find source IPs authenticating to >3 unique hosts in 1 hour
python3 << 'EOF'
from collections import defaultdict
import csv

hosts_by_src = defaultdict(set)

with open('auth_events.csv') as f:
    reader = csv.DictReader(f)
    for row in reader:
        if row['EventId'] == '4624' and row['LogonType'] == '3':
            hosts_by_src[row['SrcIP']].add(row['TargetHost'])

for src_ip, hosts in hosts_by_src.items():
    if len(hosts) > 3:
        print(f"LATERAL MOVEMENT CANDIDATE: {src_ip} authenticated to {len(hosts)} hosts: {hosts}")
EOF

7. Log Correlation & Timeline Reconstruction

Individual log sources answer isolated questions. The full attack story emerges from correlating events across sources aligned to a common timeline. Time synchronization is not optional - Windows systems default to syncing to domain controllers, Linux systems should use chronyd or ntpd, and network devices require explicit NTP configuration. A 2-minute clock skew makes correlation unreliable.

Timeline Reconstruction Methodology

Define the incident window (e.g., initial IDS alert timestamp +/- 2 hours)
Pull all relevant log sources for that window
Normalize timestamps to UTC ISO-8601
Merge and sort by timestamp
Identify the earliest observable event
Walk forward: each event should be causally explainable by preceding events
Identify gaps -- missing evidence that should exist if your hypothesis is correct

Automated Timeline Merging

# Convert Windows EVTX to JSON for correlation
evtx_dump -f json /path/to/Security.evtx \
  | jq '.[] | {ts: .Event.System.TimeCreated."#attributes"."SystemTime",
               id: .Event.System.EventID."#text",
               data: .Event.EventData}' \
  > windows_events.jsonl

# Merge multiple log sources into a single timeline CSV
python3 << 'EOF'
import json, csv, sys
from datetime import datetime, timezone

events = []

# Zeek conn.log
with open('conn.log') as f:
    for line in f:
        if line.startswith('#'): continue
        parts = line.split('\t')
        if len(parts) < 10: continue
        ts = datetime.fromtimestamp(float(parts[0]), tz=timezone.utc)
        events.append({
            'timestamp': ts.isoformat(),
            'source': 'zeek_conn',
            'src': parts[2],
            'dst': parts[4],
            'detail': f"{parts[6]}/{parts[7]} {parts[9]} bytes"
        })

# DNS log
with open('dns.log') as f:
    for line in f:
        if line.startswith('#'): continue
        parts = line.split('\t')
        if len(parts) < 10: continue
        ts = datetime.fromtimestamp(float(parts[0]), tz=timezone.utc)
        events.append({
            'timestamp': ts.isoformat(),
            'source': 'zeek_dns',
            'src': parts[2],
            'dst': parts[4],
            'detail': f"query={parts[9]} answers={parts[21] if len(parts)>21 else '-'}"
        })

# HTTP log
with open('http.log') as f:
    for line in f:
        if line.startswith('#'): continue
        parts = line.split('\t')
        if len(parts) < 16: continue
        ts = datetime.fromtimestamp(float(parts[0]), tz=timezone.utc)
        events.append({
            'timestamp': ts.isoformat(),
            'source': 'zeek_http',
            'src': parts[2],
            'dst': parts[4],
            'detail': f"{parts[7]} {parts[8]}{parts[9]} -> {parts[15]}"
        })

events.sort(key=lambda e: e['timestamp'])

writer = csv.DictWriter(sys.stdout, fieldnames=['timestamp','source','src','dst','detail'])
writer.writeheader()
writer.writerows(events)
EOF

Attack Timeline: What a Real Reconstruction Looks Like

The following is a synthetic but representative timeline of a spear-phishing to lateral movement to exfiltration incident, reconstructed from log correlation:

[09:14:33 UTC] SMTP log: mail server received email to user@corp.com
               From: attacker@lookalike.com, Subject: "Invoice Q4", Attachment: Invoice.doc

[09:15:01 UTC] Proxy log: 192.168.5.22 GET http://evil.com/stage1.exe (200, 84KB)
               User-Agent: Microsoft Office/15.0 (macro execution via document)

[09:15:03 UTC] DNS log: 192.168.5.22 query: evil.com (resolved to 203.0.113.44)

[09:15:04 UTC] Zeek conn.log: 192.168.5.22:54291 -> 203.0.113.44:443 ESTABLISHED
               2847 bytes sent, 84129 bytes received (implant download)

[09:15:07 UTC] Zeek ssl.log: SNI=evil.com, self-signed cert, JA3=51c64c77...
               validation_status="self signed certificate in certificate chain"

[09:15:10 UTC] IDS alert: SID 2014702 "ET TROJAN Possible Cobalt Strike Beacon"

[09:18:44 UTC] Zeek conn.log: 192.168.5.22:54312 -> 203.0.113.44:443 (60-sec beacon)
[09:19:44 UTC] Zeek conn.log: 192.168.5.22:54318 -> 203.0.113.44:443 (60-sec beacon)
               [pattern continues every 60 seconds -- C2 check-in]

[09:32:15 UTC] Auth log (Windows): 4624 LogonType=3 src=192.168.5.22 dst=192.168.5.10
               Account: corp\svcaccount (credential from memory via Mimikatz)

[09:32:21 UTC] Zeek conn.log: 192.168.5.22:62544 -> 192.168.5.10:445 (SMB)
               orig_bytes=8812, resp_bytes=2941, duration=0.44s

[09:32:25 UTC] Auth log (Windows): 4624 LogonType=3 src=192.168.5.22 dst=192.168.5.31

[09:33:00 UTC] Zeek files.log: file extracted from SMB to 192.168.5.10
               filename=financial_data_2024.xlsx, sha256=a3f9c2..., size=4.2MB

[09:47:12 UTC] NetFlow: 192.168.5.22 -> 203.0.113.44:443 upload=4.3MB in 12 seconds
               (exfiltration of financial data via C2 channel)

This timeline is only possible because every timestamp is UTC-synchronized and events from SMTP, proxy, DNS, Zeek, IDS, authentication, and NetFlow are merged. Any single source alone gives an incomplete picture.

8. Identifying Attack Patterns in Logs

Detecting C2 Beaconing in Proxy/Zeek Logs

# Find connections to same external IP at suspiciously regular intervals
cat conn.log \
  | zeek-cut ts id.orig_h id.resp_h id.resp_p proto orig_bytes \
  | awk '$4 == "443" && $3 !~ /^10\.|^192\.168\.|^172\./' \
  | awk '{
      key=$2"-"$3;
      if(last_ts[key]) {
          interval = $1 - last_ts[key];
          sum[key] += interval;
          count[key]++;
          if(interval < min[key] || min[key]==0) min[key]=interval;
          if(interval > max[key]) max[key]=interval;
      }
      last_ts[key]=$1;
  }
  END {
      for(k in count) {
          if(count[k]>=8) {
              avg=sum[k]/count[k];
              jitter=(max[k]-min[k])/avg;
              if(jitter<0.3) printf "BEACON: %s count=%d avg=%.1fs jitter=%.2f\n", k, count[k], avg, jitter;
          }
      }
  }'

Detecting DNS Exfiltration

# High query volume to single authoritative domain + large TXT responses
cat dns.log \
  | zeek-cut query qtype_name answers \
  | awk '{
      n=split($1, parts, ".");
      if(n>=2) apex=parts[n-1]"."parts[n];
      else apex=$1;
      count[apex]++;
  }
  END {
      for(d in count) if(count[d]>100) print count[d], d
  }' \
  | sort -rn \
  | head -20

# Large TXT record responses = data being encoded in DNS
cat dns.log \
  | zeek-cut ts id.orig_h query qtype_name answers \
  | awk '$4 == "TXT" && length($5) > 100' \
  | sort -k3                                  # group by query domain

Detecting Lateral Movement in Auth Logs

# Windows: find accounts authenticating to multiple hosts with LogonType 3
awk -F',' '
NR==1 {next}
$3 == "4624" && $5 == "3" {
    key = $6;                  # username
    hosts[key][$7] = 1;        # destination host
    src[key][$8] = 1;          # source IP
}
END {
    for(user in hosts) {
        n=0; for(h in hosts[user]) n++;
        m=0; for(s in src[user]) m++;
        if(n > 3) {
            printf "LATERAL: user=%s unique_dst_hosts=%d source_ips=%d\n", user, n, m;
        }
    }
}' auth_events.csv

# Detect pass-the-hash: Event 4624 LogonType=3 with NTLM auth (4776) but no 4768 (Kerberos TGT)
# The absence of a Kerberos TGT for a successful network logon suggests credential replay

Detecting Data Exfiltration

# NetFlow: internal hosts with significantly higher upload than download (unusual)
nfdump \
  -R /var/cache/nfdump/today/ \
  'src ip in [10.0.0.0/8] and not dst ip in [10.0.0.0/8]' \
  -A srcip,dstip \
  -s flows/bytes \
  -o "fmt:%sa %da %byt %pkt" \
  -q \
  | awk '$3 > 10485760' \             # pairs where > 10 MB transferred
  | sort -k3 -rn

# Look for staged exfiltration: data moving to internal staging host first
nfdump \
  -R /var/cache/nfdump/today/ \
  'dst ip 192.168.5.99 and bytes > 1048576' \
  -o long
# Then check staging host's external connections
nfdump \
  -R /var/cache/nfdump/today/ \
  'src ip 192.168.5.99 and not dst ip in [10.0.0.0/8]' \
  -o long

9. Chain of Custody & Forensic Integrity

Evidence collected during a network forensic investigation may be used in legal proceedings or internal disciplinary actions. If integrity cannot be demonstrated, evidence can be challenged or dismissed. The core requirements:

Integrity: prove the evidence has not been modified since collection. Use cryptographic hashing:

# Hash a pcap file immediately upon acquisition
sha256sum evidence.pcap > evidence.pcap.sha256
md5sum   evidence.pcap >> evidence.pcap.sha256

# Verify integrity later
sha256sum -c evidence.pcap.sha256

# For large capture archives: hash each file and a manifest
find /evidence/ -name "*.pcap.gz" -exec sha256sum {} \; \
  | tee /evidence/MANIFEST.sha256 \
  | gpg --clearsign > /evidence/MANIFEST.sha256.asc  # GPG-sign the manifest

Write protection: never analyze original evidence directly. Work on copies. Physical write blockers for storage media; for network captures, the pcap file should be made read-only immediately:

# Make capture file immutable (cannot be modified even by root without removing the attribute)
chattr +i /evidence/capture_2024-11-15.pcap

# Verify
lsattr /evidence/capture_2024-11-15.pcap
# ----i--------e-- /evidence/capture_2024-11-15.pcap

Documentation: maintain a chain of custody log recording who handled the evidence, when, what actions were taken, and system information:

# Record system state at time of capture
cat << 'CUSTODY_EOF' > /evidence/collection_notes.txt
Collection Date:    [recorded at collection time]
Collected By:       [investigator name]
System:             [hostname and kernel version]
Interface:          eth0
Collection Command: tcpdump -i eth0 -s 0 -w /evidence/capture.pcap
Purpose:            Incident response - suspected Cobalt Strike C2
Ticket:             INC-2024-1115-001
CUSTODY_EOF
# Then append computed values:
echo "Hash (SHA256): $(sha256sum /evidence/capture.pcap | awk '{print $1}')" >> /evidence/collection_notes.txt
echo "NTP Offset:    $(chronyc tracking | grep 'System time')" >> /evidence/collection_notes.txt

Time accuracy: document NTP synchronization status at the time of capture. If the capturing system's clock was drifted, all timestamps require adjustment before analysis:

# Check NTP synchronization status
chronyc tracking          # shows reference time, offset, and RMS offset
timedatectl status        # systemd: shows NTP sync status and current offset

10. MITRE ATT&CK Mapping

Technique	ATT&CK ID	Detection Log Source	Key Observable
Phishing: Spearphishing Attachment	T1566.001	SMTP logs, mail gateway	Executable/macro attachment from external sender
Command and Scripting Interpreter	T1059	Proxy logs	Office process making HTTP requests (macro execution)
C2 over HTTPS	T1071.001	Zeek ssl.log, NetFlow	Periodic beaconing, self-signed cert, unusual JA3
DNS Tunneling	T1071.004	DNS logs, Zeek dns.log	High NXDOMAIN rate, large TXT responses, high entropy labels
Lateral Movement: SMB/Windows Admin Shares	T1021.002	Auth logs (4624 Type 3), Zeek conn.log	Network logons to port 445 across multiple hosts
Lateral Movement: Pass the Hash	T1550.002	Auth logs (4624 + 4776, no 4768)	NTLM auth without preceding Kerberos TGT
Credential Dumping: LSASS	T1003.001	EDR/Sysmon + Auth (subsequent use)	Followed by rapid lateral movement
Exfiltration over C2 Channel	T1041	NetFlow, Zeek conn.log	Large upload bytes to C2 IP, no corresponding download
Exfiltration to Cloud Storage	T1567	Proxy logs	Large POST to cloud storage API (dropbox.com, drive.google.com)
Data Staged Locally	T1074.001	Zeek files.log, NetFlow	Internal host accumulating large file transfers from multiple sources
Indicator Removal: Clear Windows Event Logs	T1070.001	Auth logs (Event 1102: audit log cleared)	Event 1102 or 104 generated when log is cleared

Table of Contents​

1. What Network Forensics Actually Answers​

2. Evidence Sources & Collection Hierarchy​

Storage and Retention Estimates​

3. Full Packet Capture: Architecture & Storage​

Capture at Scale​

Targeted Capture During Incident Response​

File Carving from PCAP​

Reconstructing Sessions for Analyst Review​

4. NetFlow / IPFIX: Session-Level Telemetry​

What You Can and Cannot Determine from Flows​

Collecting and Querying Flows​

Beaconing Detection with Python​

5. Zeek: The Network Observation Framework​

Core Log Files​

Zeek CLI: Offline Analysis​

Querying Zeek Logs with zeek-cut and jq​

Writing a Custom Zeek Script​

6. Log Sources: Firewall, DNS, Proxy, Auth​

Firewall Logs​

DNS Logs​

Proxy Logs​

Authentication Logs​

7. Log Correlation & Timeline Reconstruction​

Timeline Reconstruction Methodology​

Automated Timeline Merging​

Attack Timeline: What a Real Reconstruction Looks Like​

8. Identifying Attack Patterns in Logs​

Detecting C2 Beaconing in Proxy/Zeek Logs​

Detecting DNS Exfiltration​

Detecting Lateral Movement in Auth Logs​

Detecting Data Exfiltration​

9. Chain of Custody & Forensic Integrity​

10. MITRE ATT&CK Mapping​

Table of Contents