Skip to main content

Chapter 2.3 - Network Forensics & Log Analysis

Module 2: Traffic Analysis & Intrusion Detection Prerequisites: Chapter 2.1 (Packet Analysis), Chapter 2.2 (IDS/IPS)


Table of Contents

  1. What Network Forensics Actually Answers
  2. Evidence Sources & Collection Hierarchy
  3. Full Packet Capture: Architecture & Storage
  4. NetFlow / IPFIX: Session-Level Telemetry
  5. Zeek: The Network Observation Framework
  6. Log Sources: Firewall, DNS, Proxy, Auth
  7. Log Correlation & Timeline Reconstruction
  8. Identifying Attack Patterns in Logs
  9. Chain of Custody & Forensic Integrity
  10. MITRE ATT&CK Mapping

1. What Network Forensics Actually Answers

Network forensics is the capture, preservation, and analysis of network traffic and associated logs to reconstruct what happened, when, how, and from where. Unlike live intrusion detection - which fires on in-progress events - network forensics operates retroactively on stored evidence to answer investigative questions:

  • What was exfiltrated? Reconstruct transferred files from packet captures.
  • How did the attacker enter? Identify the initial access vector from HTTP/SMTP/VPN logs.
  • Which systems were touched? Map lateral movement from authentication logs and NetFlow.
  • When did the intrusion begin? Establish a precise timeline from correlated log timestamps.
  • Is the attacker still present? Detect ongoing C2 beaconing in flow data.

These questions have different evidence requirements. File reconstruction requires full packet capture (FPC). Lateral movement mapping needs authentication event logs and NetFlow. Timeline precision requires all sources to be time-synchronized - an investigator who ignores NTP drift will build timelines with phantom gaps and overlaps.

The evidence quality hierarchy matters: FPC is the gold standard (every byte is preserved), but it's expensive at scale. NetFlow is cheap and scalable but loses payload. Logs are high-signal but depend entirely on what the generating application chose to record.


2. Evidence Sources & Collection Hierarchy

Storage and Retention Estimates

SourceVolume at 1 Gbps sustainedTypical Retention
Full packet capture~80-100 GB/hr (after compression)24-72 hours (rolling)
NetFlow (sampled 1:100)~50-200 MB/hr30-90 days
Zeek logs~1-3 GB/hr30-90 days
Firewall deny logs~100-500 MB/hr90-365 days
DNS query logs~50-200 MB/hr90-365 days
Proxy logs~500 MB-2 GB/hr90-365 days

The practical implication: FPC is a short window that catches active incidents; NetFlow and application logs are the primary evidence sources for investigating incidents discovered days or weeks after the fact.


3. Full Packet Capture: Architecture & Storage

Capture at Scale

Single-interface tcpdump fails above a few hundred Mbps due to kernel copy overhead and lock contention. Production FPC uses:

  • AF_PACKET with PACKET_MMAP - kernel ring buffer, zero-copy to userspace
  • PF_RING / DPDK - kernel bypass; handles 10-100 Gbps line rates
  • Dedicated appliances - Endace, cPacket, ExtraHop hardware with lossless capture guarantees

For lab and incident response work, tcpdump and tshark are adequate:

# Capture to rotating pcap files: 1 GB per file, keep last 50 files
tcpdump \
-i eth0 \ # capture interface
-s 0 \ # full snaplen (0 = entire packet, no truncation)
-w /data/capture/capture-%Y%m%d-%H%M%S.pcap \
-G 3600 \ # rotate every 3600 seconds
-C 1000 \ # also rotate at 1000 MB
-W 50 \ # keep at most 50 files (ring buffer)
-z gzip \ # compress rotated files
not port 22 # exclude SSH (your own management traffic)

# High-performance capture with PF_RING zero-copy
tcpdump \
--immediate-mode \
-i zc:eth0 \ # zc: prefix = zero-copy PF_RING interface
-s 0 \
-w /nvme/capture/live.pcap

Targeted Capture During Incident Response

When you know the suspect host or connection, limit capture scope to reduce storage and analysis noise:

# Capture all traffic to/from a specific host
tcpdump -i eth0 -s 0 -w suspect_host.pcap \
host 192.168.1.55

# Capture only C2-indicative ports from a suspect host
tcpdump -i eth0 -s 0 -w suspect_c2.pcap \
"host 192.168.1.55 and (port 443 or port 4444 or port 8080)"

# Capture DNS queries + responses for a specific domain
tcpdump -i eth0 -s 0 -w dns_evil.pcap \
"port 53 and (udp or tcp)"

# Capture SMB traffic for lateral movement analysis
tcpdump -i eth0 -s 0 -w smb_lateral.pcap \
"port 445 or port 139"

File Carving from PCAP

Network forensics frequently requires reconstructing files transferred over the network - malware payloads, exfiltrated documents, C2 implants.

# Carve HTTP-transferred files from pcap using NetworkMiner (CLI mode)
mono /opt/NetworkMiner/NetworkMiner.exe \
--capture-file suspicious.pcap \
--output-dir /tmp/carved_files

# Carve with tcpflow: reassembles TCP streams, writes each stream to a file
tcpflow \
-r capture.pcap \ # read from pcap
-o /tmp/tcpflow_output \ # output directory
-ae # -a: report all packets; -e: enable regex named capture

# Carve HTTP objects with tshark (Wireshark CLI)
tshark \
-r capture.pcap \
--export-objects http,/tmp/http_objects # writes each HTTP object as a separate file

# Carve SMB transferred files
tshark \
-r capture.pcap \
--export-objects smb,/tmp/smb_objects

# Extract and hash all carved files for IOC matching
find /tmp/http_objects -type f \
| xargs -I{} sh -c 'md5sum {}; sha256sum {}'

Reconstructing Sessions for Analyst Review

# Follow a specific TCP stream in tshark (stream index from initial analysis)
tshark -r capture.pcap \
-z follow,tcp,ascii,0 \ # stream index 0; ascii mode (readable text)
-q # suppress default packet output

# Follow HTTP conversation
tshark -r capture.pcap \
-Y "http" \
-T fields \
-e frame.time \
-e ip.src \
-e http.request.method \
-e http.request.uri \
-e http.response.code \
-e http.content_length \
-E header=y \
-E separator=,

# Reconstruct a full HTTP response body (e.g., downloaded malware)
tshark -r capture.pcap \
-Y "tcp.stream eq 5" \ # isolate stream 5
-T fields -e data \ # raw hex data
| tr -d '\n' \
| xxd -r -p > /tmp/stream5_payload.bin
file /tmp/stream5_payload.bin # identify file type

4. NetFlow / IPFIX: Session-Level Telemetry

NetFlow (Cisco), IPFIX (IETF standard, RFC 7011), and sFlow (sampled) all describe network sessions without preserving payload. A flow record captures the 5-tuple (src IP, dst IP, src port, dst port, protocol) plus counters (bytes, packets, duration, TCP flags) and metadata (AS numbers, DSCP, input/output interface).

What You Can and Cannot Determine from Flows

Investigative questionNetFlow answer
Did host A talk to host B?Yes - precise src/dst/port/time
How much data was transferred?Yes - byte/packet counters per direction
What protocol was used?Yes - port + protocol field (L4 only)
Was it encrypted?Yes - inferred (TLS = port 443 typically)
What file was downloaded?No - no payload, no filename
What SQL query was run?No - no application-layer content
Was the session successful?Partial - TCP flags indicate SYN-ACK handshake, but no app-layer response code
Was data exfiltrated?Partial - inferred from large upload byte counts

Collecting and Querying Flows

# nfdump: analyze NetFlow/IPFIX data stored by nfcapd
# Show top talkers by bytes in the last 6 hours
nfdump \
-R /var/cache/nfdump/2024/11/15/ \
-t 2024-11-15.08:00:00-2024-11-15.14:00:00 \
-s srcip/bytes \ # sort: top source IPs by byte count
-n 20 \ # show top 20
-o long

# Find all connections from a specific host
nfdump \
-R /var/cache/nfdump/ \
-t 2024-11-15.00:00:00-2024-11-16.00:00:00 \
'src ip 192.168.1.55' \
-o long

# Detect large data transfers (potential exfiltration) -- uploads > 50 MB
nfdump \
-R /var/cache/nfdump/ \
'src ip in [10.0.0.0/8] and bytes > 52428800 and not dst ip in [10.0.0.0/8]' \
-o long \
-s dstip/bytes

# Identify beaconing: connections from one host to same external IP at regular intervals
nfdump \
-R /var/cache/nfdump/ \
'src ip 192.168.1.55 and dst ip 203.0.113.44' \
-o "fmt:%ts %td %byt %pkt" \
-q

Beaconing Detection with Python

Automated beaconing detection operates on flow duration/timing data:

#!/usr/bin/env python3
"""
beacon_detect.py - detect periodic C2 beaconing from NetFlow CSV export
Usage: nfdump -R /data/flows/ -o csv | python3 beacon_detect.py
"""
import sys
import statistics
from collections import defaultdict
from datetime import datetime

flows = defaultdict(list)

for line in sys.stdin:
line = line.strip()
if not line or line.startswith('#') or line.startswith('Date'):
continue
parts = line.split(',')
if len(parts) < 7:
continue
try:
ts_str = parts[0] # timestamp field
src_ip = parts[3] # source IP
dst_ip = parts[5] # destination IP
dst_port = parts[6] # destination port
ts = datetime.strptime(ts_str, '%Y-%m-%d %H:%M:%S')
key = (src_ip, dst_ip, dst_port)
flows[key].append(ts.timestamp()) # store unix timestamp
except (ValueError, IndexError):
continue

# Analyze each src->dst:port pair for periodic behavior
print(f"{'Source':<20} {'Destination':<20} {'Port':<8} {'Count':>6} {'AvgInterval':>14} {'Jitter(CV)':>12}")
print("-" * 85)

for (src, dst, port), timestamps in flows.items():
if len(timestamps) < 5: # need at least 5 samples for meaningful analysis
continue
timestamps.sort()
intervals = [timestamps[i+1] - timestamps[i] for i in range(len(timestamps)-1)]
avg_interval = statistics.mean(intervals)
if avg_interval < 10: # ignore sub-10-second intervals (too noisy)
continue
stdev = statistics.stdev(intervals) if len(intervals) > 1 else 0
cv = stdev / avg_interval # coefficient of variation; low CV = regular beaconing

if cv < 0.3 and len(timestamps) >= 10: # beacon threshold: CV < 0.3, 10+ samples
print(f"{src:<20} {dst:<20} {port:<8} {len(timestamps):>6} "
f"{avg_interval:>12.1f}s {cv:>12.4f} <- BEACON CANDIDATE")
elif cv < 0.5:
print(f"{src:<20} {dst:<20} {port:<8} {len(timestamps):>6} "
f"{avg_interval:>12.1f}s {cv:>12.4f}")

5. Zeek: The Network Observation Framework

Zeek (formerly Bro) occupies the space between full packet capture and raw NetFlow - it performs deep protocol analysis and emits structured, query-friendly logs without storing raw packets. For network forensics, Zeek logs are often the most valuable artifact: they contain application-layer semantics (HTTP URIs, DNS query names, TLS certificates, file hashes) at a fraction of the storage cost of PCAP.

Core Log Files

Log fileContentKey forensic fields
conn.logAll TCP/UDP/ICMP sessionsid.orig_h, id.resp_h, service, orig_bytes, resp_bytes, conn_state, history
dns.logDNS queries and responsesquery, qtype_name, answers, TTL
http.logHTTP requestsmethod, host, uri, referrer, user_agent, status_code, resp_body_len
ssl.logTLS sessionsserver_name (SNI), subject, issuer, validation_status, ja3, ja3s
files.logFile transfers (any protocol)source, filename, mime_type, md5, sha1, sha256
x509.logTLS certificatescertificate.subject, certificate.issuer, certificate.not_valid_after, san.dns
smtp.logEmail sessionsfrom, to, subject, path, user_agent
notice.logZeek policy alertsnote, msg, src, dst, sub
weird.logProtocol anomaliesname, msg - malformed packets, protocol violations

Zeek CLI: Offline Analysis

# Process a pcap offline, write all logs to current directory
zeek -r suspicious_traffic.pcap \
local \ # load default policy scripts
LogAscii::use_json=T # output logs as JSON instead of TSV

# Enable specific detection scripts
zeek -r capture.pcap \
/opt/zeek/share/zeek/policy/protocols/ssl/validate-certs.zeek \
/opt/zeek/share/zeek/policy/protocols/dns/detect-external-names.zeek \
/opt/zeek/share/zeek/policy/frameworks/intel/do_notice.zeek \
Intel::read_files=/etc/zeek/intel/indicators.dat # load IOC feed

# Live capture mode (production deployment)
zeekctl deploy # deploy using ZeekControl (manages workers, logger, manager)
zeekctl status # check node health
zeekctl stop

Querying Zeek Logs with zeek-cut and jq

Zeek's TSV logs are fast to query with zeek-cut (column extractor); JSON logs use jq:

# Extract all unique external destinations and byte counts from conn.log
cat conn.log \
| zeek-cut id.orig_h id.resp_h orig_bytes resp_bytes conn_state \
| awk '$5 == "SF"' \ # SF = normal established+closed connection
| sort -k3 -rn \ # sort by orig_bytes descending
| head -20

# Find all DNS queries to a suspicious TLD
cat dns.log \
| zeek-cut ts id.orig_h query answers \
| grep -E '\.(ru|cn|tk|pw|xyz)$' \
| sort -u

# Find HTTP requests with suspicious user agents (or missing user agent)
cat http.log \
| zeek-cut ts id.orig_h host uri user_agent status_code \
| awk -F'\t' '$5 == "-" || $5 ~ /python|curl|wget|go-http/' \
| sort

# Find all files transferred with their hashes (requires file analysis enabled)
cat files.log \
| zeek-cut ts source fuid mime_type filename md5 sha256 \
| grep -v '-' \ # filter out entries with missing hashes
| sort -k4 # sort by mime type

# Cross-reference file hashes against VirusTotal (manual -- requires vt-cli)
cat files.log \
| zeek-cut sha256 \
| grep -v '^-$' \
| sort -u \
| xargs -I{} vt file {}

# Identify TLS connections with self-signed certificates
cat ssl.log \
| zeek-cut ts id.orig_h id.resp_h server_name validation_status \
| awk '$5 == "self signed certificate in certificate chain"' \
| sort -k3 # group by destination IP

Writing a Custom Zeek Script

Zeek's scripting language enables custom detection logic that operates on protocol events:

# /etc/zeek/site/detect_dns_tunneling.zeek
# Fires a notice when a DNS query name exceeds 60 chars or contains base64-like patterns

@load base/frameworks/notice

module DNSTunnel;

export {
redef enum Notice::Type += {
DNS_Long_Query,
DNS_HighEntropy_Query
};
}

event dns_request(c: connection, msg: dns_msg, query: string, qtype: count, qclass: count)
{
# Long query check
if ( |query| > 60 )
{
NOTICE([$note=DNS_Long_Query,
$conn=c,
$msg=fmt("Long DNS query: %s (%d chars)", query, |query|),
$identifier=cat(c$id$orig_h, query)]);
}

# Base64-like pattern check: long runs of alphanumeric before first dot
local label = split_string(query, /\./)[0];
if ( |label| > 30 && /^[A-Za-z0-9+\/]{30,}/ in label )
{
NOTICE([$note=DNS_HighEntropy_Query,
$conn=c,
$msg=fmt("Possible base64-encoded DNS label: %s", label),
$identifier=cat(c$id$orig_h, label)]);
}
}

6. Log Sources: Firewall, DNS, Proxy, Auth

Firewall Logs

Firewall logs record allow/deny decisions with the 5-tuple plus timestamp, interface, rule name, and NAT translation state. What they reveal and miss:

  • Reveal: blocked connection attempts (reconnaissance), permitted connections (communication paths), NAT mappings (internal host behind translated address)
  • Miss: application-layer content, encrypted payload, traffic permitted by overly broad rules
# Parse iptables/nftables log from syslog
# iptables must be configured with -j LOG rule before DROP/ACCEPT
grep "IN=eth0" /var/log/kern.log \
| grep -E "DPT=(4444|8080|1337)" \ # suspicious destination ports
| awk '{
for(i=1;i<=NF;i++){
if($i~/^SRC=/) printf "src=%s ", substr($i,5);
if($i~/^DST=/) printf "dst=%s ", substr($i,5);
if($i~/^DPT=/) printf "dport=%s ", substr($i,5);
if($i~/^SPT=/) printf "sport=%s\n", substr($i,5);
}
}'

# pfSense/OPNsense: filterlog CSV format
# Fields: rule,sub,anchor,tracker,interface,reason,action,direction,ip_version,...
grep ",block," /var/log/filter.log \
| awk -F',' '{print $11, $13, $14, $16}' \
| sort | uniq -c | sort -rn

# Cisco ASA: parse deny logs
grep "Deny" /var/log/asa.log \
| grep -oP "src\s+\K[\d.]+" \ # extract source IPs
| sort | uniq -c | sort -rn \
| head -20

DNS Logs

DNS is a forensic goldmine. Every hostname resolution is recorded - including C2 domain lookups, DGA beacons, and data exfiltration channels. DNS logs correlate internal host activity to external destinations even when the actual connection is encrypted.

# BIND (named) query log -- enable in named.conf:
# logging { channel query_log { file "/var/log/named/query.log"; severity info; };
# category queries { query_log; }; };

# Extract all unique domains queried by a specific host
grep "192.168.1.55" /var/log/named/query.log \
| grep -oP "query: \K\S+" \ # extract domain name
| sort | uniq -c | sort -rn

# Find NX (NXDOMAIN) responses -- hosts querying non-existent domains
# High NX rate from one host = DGA or DNS tunneling
grep "NXDOMAIN" /var/log/named/query.log \
| awk '{print $5}' \ # client IP field
| sort | uniq -c | sort -rn

# Windows DNS debug log (enable via dnsmgmt.msc -> server properties -> debug logging)
Get-Content C:\Windows\System32\dns\dns.log |
Where-Object { $_ -match "QUERY" } |
Select-String -Pattern "[A-Za-z0-9]{20,}\.(com|net|org)" |
Sort-Object | Get-Unique

# Passive DNS -- using dnstap (binary DNS capture format, higher fidelity than text logs)
fstrm_capture \
-t protobuf:dnstap.Dnstap \
-u /var/run/named/dnstap.sock \
-w /var/log/dnstap/dns.fstrm

# Decode dnstap frames
dnstap-read /var/log/dnstap/dns.fstrm \
| grep -i "evil\.com"

Proxy Logs

HTTP/HTTPS proxies (Squid, Bluecoat, Zscaler) log every web request with full URL, response code, bytes transferred, and client identity. In organizations that route all web traffic through a proxy, proxy logs are often the single richest source of evidence for web-based attacks.

# Squid access.log format:
# timestamp duration client_ip result/status bytes method url hierarchy mime_type

# Find all URLs accessed by a suspect host with large response sizes
awk '$3 == "192.168.1.55" && $5 > 1048576' /var/log/squid/access.log \
| awk '{print $3, $5, $7}' \ # client, bytes, URL
| sort -k2 -rn

# Find requests to newly registered domains (heuristic: short second-level domain)
awk '{print $7}' /var/log/squid/access.log \
| grep -oP 'https?://\K[^/]+' \ # extract hostname
| sort | uniq -c | sort -rn \
| awk '$2 ~ /^[a-z0-9]{4,8}\.(com|net|cc|pw|xyz)$/'

# Find POST requests (data being sent out) with large bodies
awk '$6 == "POST" && $5 > 10000' /var/log/squid/access.log \
| awk '{print $3, $7, $5}'

# Zscaler / cloud proxy: NSS log forwarded via syslog -- parse JSON
cat /var/log/zscaler/nss.log \
| jq 'select(.action == "Allowed") | select(.bytes_sent > 100000) | {user, url, bytes_sent}'

Authentication Logs

Authentication events tell you who authenticated from where and when - essential for tracking lateral movement (T1021) and credential abuse.

# Linux: failed SSH logins (brute force detection)
grep "Failed password" /var/log/auth.log \
| awk '{print $(NF-3)}' \ # extract source IP
| sort | uniq -c | sort -rn \
| head -20

# Successful logins after previous failures from the same IP (successful brute force)
awk '/Failed password/{fail[$NF]++} /Accepted password/{if(fail[$NF]>5) print "BRUTE SUCCESS:", $NF, fail[$NF], "failures"}' \
/var/log/auth.log

# sudo usage log -- privilege escalation after compromise
grep "sudo:" /var/log/auth.log \
| grep "COMMAND" \
| awk '{print $1, $2, $3, $5, $NF}'

# Windows Security Event Log -- use Get-WinEvent (PowerShell) or evtx_dump
# Event IDs:
# 4624 = Successful logon
# 4625 = Failed logon
# 4648 = Logon with explicit credentials (pass-the-hash indicator)
# 4768 = Kerberos TGT requested
# 4769 = Kerberos service ticket requested
# 4776 = NTLM authentication
# 4771 = Kerberos pre-auth failed (Kerberoasting indicator)

# PowerShell: find all 4624 events with logon type 3 (network) in the last 24 hours
Get-WinEvent -FilterHashtable @{
LogName = 'Security'
Id = 4624
StartTime = (Get-Date).AddHours(-24)
} | Where-Object {
$_.Properties[8].Value -eq 3 # LogonType 3 = network logon
} | Select-Object TimeCreated,
@{N='User'; E={$_.Properties[5].Value}},
@{N='SrcIP'; E={$_.Properties[18].Value}},
@{N='WorkStation'; E={$_.Properties[11].Value}} |
Format-Table -AutoSize

# Rapid logon across multiple systems = lateral movement
# Python: find source IPs authenticating to >3 unique hosts in 1 hour
python3 << 'EOF'
from collections import defaultdict
import csv

hosts_by_src = defaultdict(set)

with open('auth_events.csv') as f:
reader = csv.DictReader(f)
for row in reader:
if row['EventId'] == '4624' and row['LogonType'] == '3':
hosts_by_src[row['SrcIP']].add(row['TargetHost'])

for src_ip, hosts in hosts_by_src.items():
if len(hosts) > 3:
print(f"LATERAL MOVEMENT CANDIDATE: {src_ip} authenticated to {len(hosts)} hosts: {hosts}")
EOF

7. Log Correlation & Timeline Reconstruction

Individual log sources answer isolated questions. The full attack story emerges from correlating events across sources aligned to a common timeline. Time synchronization is not optional - Windows systems default to syncing to domain controllers, Linux systems should use chronyd or ntpd, and network devices require explicit NTP configuration. A 2-minute clock skew makes correlation unreliable.

Timeline Reconstruction Methodology

1. Define the incident window (e.g., initial IDS alert timestamp +/- 2 hours)
2. Pull all relevant log sources for that window
3. Normalize timestamps to UTC ISO-8601
4. Merge and sort by timestamp
5. Identify the earliest observable event
6. Walk forward: each event should be causally explainable by preceding events
7. Identify gaps -- missing evidence that should exist if your hypothesis is correct

Automated Timeline Merging

# Convert Windows EVTX to JSON for correlation
evtx_dump -f json /path/to/Security.evtx \
| jq '.[] | {ts: .Event.System.TimeCreated."#attributes"."SystemTime",
id: .Event.System.EventID."#text",
data: .Event.EventData}' \
> windows_events.jsonl

# Merge multiple log sources into a single timeline CSV
python3 << 'EOF'
import json, csv, sys
from datetime import datetime, timezone

events = []

# Zeek conn.log
with open('conn.log') as f:
for line in f:
if line.startswith('#'): continue
parts = line.split('\t')
if len(parts) < 10: continue
ts = datetime.fromtimestamp(float(parts[0]), tz=timezone.utc)
events.append({
'timestamp': ts.isoformat(),
'source': 'zeek_conn',
'src': parts[2],
'dst': parts[4],
'detail': f"{parts[6]}/{parts[7]} {parts[9]} bytes"
})

# DNS log
with open('dns.log') as f:
for line in f:
if line.startswith('#'): continue
parts = line.split('\t')
if len(parts) < 10: continue
ts = datetime.fromtimestamp(float(parts[0]), tz=timezone.utc)
events.append({
'timestamp': ts.isoformat(),
'source': 'zeek_dns',
'src': parts[2],
'dst': parts[4],
'detail': f"query={parts[9]} answers={parts[21] if len(parts)>21 else '-'}"
})

# HTTP log
with open('http.log') as f:
for line in f:
if line.startswith('#'): continue
parts = line.split('\t')
if len(parts) < 16: continue
ts = datetime.fromtimestamp(float(parts[0]), tz=timezone.utc)
events.append({
'timestamp': ts.isoformat(),
'source': 'zeek_http',
'src': parts[2],
'dst': parts[4],
'detail': f"{parts[7]} {parts[8]}{parts[9]} -> {parts[15]}"
})

events.sort(key=lambda e: e['timestamp'])

writer = csv.DictWriter(sys.stdout, fieldnames=['timestamp','source','src','dst','detail'])
writer.writeheader()
writer.writerows(events)
EOF

Attack Timeline: What a Real Reconstruction Looks Like

The following is a synthetic but representative timeline of a spear-phishing to lateral movement to exfiltration incident, reconstructed from log correlation:

[09:14:33 UTC] SMTP log: mail server received email to user@corp.com
From: attacker@lookalike.com, Subject: "Invoice Q4", Attachment: Invoice.doc

[09:15:01 UTC] Proxy log: 192.168.5.22 GET http://evil.com/stage1.exe (200, 84KB)
User-Agent: Microsoft Office/15.0 (macro execution via document)

[09:15:03 UTC] DNS log: 192.168.5.22 query: evil.com (resolved to 203.0.113.44)

[09:15:04 UTC] Zeek conn.log: 192.168.5.22:54291 -> 203.0.113.44:443 ESTABLISHED
2847 bytes sent, 84129 bytes received (implant download)

[09:15:07 UTC] Zeek ssl.log: SNI=evil.com, self-signed cert, JA3=51c64c77...
validation_status="self signed certificate in certificate chain"

[09:15:10 UTC] IDS alert: SID 2014702 "ET TROJAN Possible Cobalt Strike Beacon"

[09:18:44 UTC] Zeek conn.log: 192.168.5.22:54312 -> 203.0.113.44:443 (60-sec beacon)
[09:19:44 UTC] Zeek conn.log: 192.168.5.22:54318 -> 203.0.113.44:443 (60-sec beacon)
[pattern continues every 60 seconds -- C2 check-in]

[09:32:15 UTC] Auth log (Windows): 4624 LogonType=3 src=192.168.5.22 dst=192.168.5.10
Account: corp\svcaccount (credential from memory via Mimikatz)

[09:32:21 UTC] Zeek conn.log: 192.168.5.22:62544 -> 192.168.5.10:445 (SMB)
orig_bytes=8812, resp_bytes=2941, duration=0.44s

[09:32:25 UTC] Auth log (Windows): 4624 LogonType=3 src=192.168.5.22 dst=192.168.5.31

[09:33:00 UTC] Zeek files.log: file extracted from SMB to 192.168.5.10
filename=financial_data_2024.xlsx, sha256=a3f9c2..., size=4.2MB

[09:47:12 UTC] NetFlow: 192.168.5.22 -> 203.0.113.44:443 upload=4.3MB in 12 seconds
(exfiltration of financial data via C2 channel)

This timeline is only possible because every timestamp is UTC-synchronized and events from SMTP, proxy, DNS, Zeek, IDS, authentication, and NetFlow are merged. Any single source alone gives an incomplete picture.


8. Identifying Attack Patterns in Logs

Detecting C2 Beaconing in Proxy/Zeek Logs

# Find connections to same external IP at suspiciously regular intervals
cat conn.log \
| zeek-cut ts id.orig_h id.resp_h id.resp_p proto orig_bytes \
| awk '$4 == "443" && $3 !~ /^10\.|^192\.168\.|^172\./' \
| awk '{
key=$2"-"$3;
if(last_ts[key]) {
interval = $1 - last_ts[key];
sum[key] += interval;
count[key]++;
if(interval < min[key] || min[key]==0) min[key]=interval;
if(interval > max[key]) max[key]=interval;
}
last_ts[key]=$1;
}
END {
for(k in count) {
if(count[k]>=8) {
avg=sum[k]/count[k];
jitter=(max[k]-min[k])/avg;
if(jitter<0.3) printf "BEACON: %s count=%d avg=%.1fs jitter=%.2f\n", k, count[k], avg, jitter;
}
}
}'

Detecting DNS Exfiltration

# High query volume to single authoritative domain + large TXT responses
cat dns.log \
| zeek-cut query qtype_name answers \
| awk '{
n=split($1, parts, ".");
if(n>=2) apex=parts[n-1]"."parts[n];
else apex=$1;
count[apex]++;
}
END {
for(d in count) if(count[d]>100) print count[d], d
}' \
| sort -rn \
| head -20

# Large TXT record responses = data being encoded in DNS
cat dns.log \
| zeek-cut ts id.orig_h query qtype_name answers \
| awk '$4 == "TXT" && length($5) > 100' \
| sort -k3 # group by query domain

Detecting Lateral Movement in Auth Logs

# Windows: find accounts authenticating to multiple hosts with LogonType 3
awk -F',' '
NR==1 {next}
$3 == "4624" && $5 == "3" {
key = $6; # username
hosts[key][$7] = 1; # destination host
src[key][$8] = 1; # source IP
}
END {
for(user in hosts) {
n=0; for(h in hosts[user]) n++;
m=0; for(s in src[user]) m++;
if(n > 3) {
printf "LATERAL: user=%s unique_dst_hosts=%d source_ips=%d\n", user, n, m;
}
}
}' auth_events.csv

# Detect pass-the-hash: Event 4624 LogonType=3 with NTLM auth (4776) but no 4768 (Kerberos TGT)
# The absence of a Kerberos TGT for a successful network logon suggests credential replay

Detecting Data Exfiltration

# NetFlow: internal hosts with significantly higher upload than download (unusual)
nfdump \
-R /var/cache/nfdump/today/ \
'src ip in [10.0.0.0/8] and not dst ip in [10.0.0.0/8]' \
-A srcip,dstip \
-s flows/bytes \
-o "fmt:%sa %da %byt %pkt" \
-q \
| awk '$3 > 10485760' \ # pairs where > 10 MB transferred
| sort -k3 -rn

# Look for staged exfiltration: data moving to internal staging host first
nfdump \
-R /var/cache/nfdump/today/ \
'dst ip 192.168.5.99 and bytes > 1048576' \
-o long
# Then check staging host's external connections
nfdump \
-R /var/cache/nfdump/today/ \
'src ip 192.168.5.99 and not dst ip in [10.0.0.0/8]' \
-o long

9. Chain of Custody & Forensic Integrity

Evidence collected during a network forensic investigation may be used in legal proceedings or internal disciplinary actions. If integrity cannot be demonstrated, evidence can be challenged or dismissed. The core requirements:

Integrity: prove the evidence has not been modified since collection. Use cryptographic hashing:

# Hash a pcap file immediately upon acquisition
sha256sum evidence.pcap > evidence.pcap.sha256
md5sum evidence.pcap >> evidence.pcap.sha256

# Verify integrity later
sha256sum -c evidence.pcap.sha256

# For large capture archives: hash each file and a manifest
find /evidence/ -name "*.pcap.gz" -exec sha256sum {} \; \
| tee /evidence/MANIFEST.sha256 \
| gpg --clearsign > /evidence/MANIFEST.sha256.asc # GPG-sign the manifest

Write protection: never analyze original evidence directly. Work on copies. Physical write blockers for storage media; for network captures, the pcap file should be made read-only immediately:

# Make capture file immutable (cannot be modified even by root without removing the attribute)
chattr +i /evidence/capture_2024-11-15.pcap

# Verify
lsattr /evidence/capture_2024-11-15.pcap
# ----i--------e-- /evidence/capture_2024-11-15.pcap

Documentation: maintain a chain of custody log recording who handled the evidence, when, what actions were taken, and system information:

# Record system state at time of capture
cat << 'CUSTODY_EOF' > /evidence/collection_notes.txt
Collection Date: [recorded at collection time]
Collected By: [investigator name]
System: [hostname and kernel version]
Interface: eth0
Collection Command: tcpdump -i eth0 -s 0 -w /evidence/capture.pcap
Purpose: Incident response - suspected Cobalt Strike C2
Ticket: INC-2024-1115-001
CUSTODY_EOF
# Then append computed values:
echo "Hash (SHA256): $(sha256sum /evidence/capture.pcap | awk '{print $1}')" >> /evidence/collection_notes.txt
echo "NTP Offset: $(chronyc tracking | grep 'System time')" >> /evidence/collection_notes.txt

Time accuracy: document NTP synchronization status at the time of capture. If the capturing system's clock was drifted, all timestamps require adjustment before analysis:

# Check NTP synchronization status
chronyc tracking # shows reference time, offset, and RMS offset
timedatectl status # systemd: shows NTP sync status and current offset

10. MITRE ATT&CK Mapping

TechniqueATT&CK IDDetection Log SourceKey Observable
Phishing: Spearphishing AttachmentT1566.001SMTP logs, mail gatewayExecutable/macro attachment from external sender
Command and Scripting InterpreterT1059Proxy logsOffice process making HTTP requests (macro execution)
C2 over HTTPST1071.001Zeek ssl.log, NetFlowPeriodic beaconing, self-signed cert, unusual JA3
DNS TunnelingT1071.004DNS logs, Zeek dns.logHigh NXDOMAIN rate, large TXT responses, high entropy labels
Lateral Movement: SMB/Windows Admin SharesT1021.002Auth logs (4624 Type 3), Zeek conn.logNetwork logons to port 445 across multiple hosts
Lateral Movement: Pass the HashT1550.002Auth logs (4624 + 4776, no 4768)NTLM auth without preceding Kerberos TGT
Credential Dumping: LSASST1003.001EDR/Sysmon + Auth (subsequent use)Followed by rapid lateral movement
Exfiltration over C2 ChannelT1041NetFlow, Zeek conn.logLarge upload bytes to C2 IP, no corresponding download
Exfiltration to Cloud StorageT1567Proxy logsLarge POST to cloud storage API (dropbox.com, drive.google.com)
Data Staged LocallyT1074.001Zeek files.log, NetFlowInternal host accumulating large file transfers from multiple sources
Indicator Removal: Clear Windows Event LogsT1070.001Auth logs (Event 1102: audit log cleared)Event 1102 or 104 generated when log is cleared