The Breach That Was Already Documented
In early 2026, a mid-sized logistics company discovered ransomware encrypting its file servers on a Tuesday morning. The incident response team pulled logs and found something that made the discovery worse: every stage of the attack was recorded. Authentication anomalies, lateral movement between workstations, a staging server making repeated outbound calls to a known proxy infrastructure, and file access patterns consistent with data exfiltration. All of it was there. The logs had been writing to a SIEM for months. Nobody had looked at them with enough context to understand what they were seeing.
This scenario is not unusual. The DFIR Report's recent analysis of SystemBC-based intrusions shows a consistent pattern: attackers use proxy tools to establish persistent, low-noise footholds that generate log entries that look unremarkable in isolation. The problem is rarely a lack of log data. It is a lack of structured analysis applied to that data before the damage is done.
This article is about building a log analysis practice that catches threats in motion, using realistic detection logic, well-configured collection pipelines, and human workflows that turn raw entries into actionable intelligence.
Understanding What Your Logs Actually Contain
Before writing a single detection rule, security teams need to conduct an honest audit of what log sources they actually have, what fields those sources populate reliably, and what gaps exist in their coverage. Most environments have more log data than they process effectively, and many environments have critical blind spots in high-value areas.
The core log sources for threat detection fall into a few categories. Authentication logs from Active Directory, LDAP, and cloud identity providers capture login events, privilege escalations, and service account activity. Network flow logs capture source and destination IPs, ports, protocols, and byte counts. Endpoint logs from EDR platforms capture process creation, file writes, registry modifications, and network connections at the process level. Web proxy and DNS logs record every domain resolution and HTTP transaction on the network. Application logs from web servers, databases, and custom software capture user actions and errors.
Each of these sources tells a partial story. The investigation into the logistics company breach revealed that their SIEM was ingesting firewall logs and Windows Event Logs, but DNS query logs from their internal resolvers were feeding into a separate system that nobody monitored. The SystemBC proxy component used DNS to communicate with its command and control infrastructure. That traffic was recorded in full. It just lived in a system that was treated as infrastructure telemetry rather than security data.
Log Quality Before Log Volume
A common mistake is treating log volume as a proxy for security maturity. Organizations that ingest billions of events per day but parse them poorly are no better positioned than organizations with focused, high-quality collection from fewer sources. Before expanding collection, validate existing sources for completeness.
For Windows Event Logs specifically, the default audit policy in most Active Directory environments captures a fraction of what is needed for meaningful threat detection. Event ID 4688 for process creation is often disabled by default. Sysmon, Microsoft's free system monitor tool, dramatically improves endpoint visibility by capturing command-line arguments, parent-child process relationships, network connections, and file hashes. Deploying Sysmon with a well-maintained configuration such as the SwiftOnSecurity or Olaf Hartong modular configs is one of the highest-return investments a Windows-centric team can make.
For Linux systems, auditd provides kernel-level syscall logging, but it requires careful tuning. Enabling it without filtering generates enormous volume with poor signal quality. Start with rules that capture privilege escalation attempts, crontab modifications, SSH key changes, and executions from unusual directories like /tmp and /dev/shm.
Building Detection Logic Around Attack Behaviors
Detection rules built around specific file hashes or IP addresses age poorly. Attackers rotate infrastructure constantly. The anti-DDoS firm recently caught launching attacks against Brazilian ISPs is a reminder that threat actors repurpose legitimate-looking infrastructure to blend with normal traffic. IP-based indicators have a useful but limited shelf life.
Behavioral detection logic built around how attacks unfold is significantly more durable. The MITRE ATT&CK framework provides a structured vocabulary for describing attacker techniques, and mapping detection rules to ATT&CK techniques creates a coverage model that shows where your visibility is strong and where gaps exist.
Lateral Movement Signatures in Authentication Logs
Lateral movement almost always generates characteristic authentication patterns. Pass-the-hash and pass-the-ticket attacks produce logon events with specific logon types. In Windows environments, Event ID 4624 with Logon Type 3 (network logon) from a workstation to another workstation is suspicious outside of specific administrative patterns. Event ID 4648 indicates a logon using explicit credentials, which attackers use when pivoting between systems with stolen credential material.
A practical detection query in any SIEM platform looks for a single source account performing Type 3 logons to more than a threshold number of distinct hosts within a short window. The threshold depends on your environment. In a flat network with no micro-segmentation, normal users rarely authenticate to more than two or three distinct hosts per hour. An account touching fifteen hosts in thirty minutes warrants investigation regardless of whether the account is flagged for other reasons.
Combine this with failed authentication events preceding the successful ones. Event ID 4625 (failed logon) followed by Event ID 4624 (successful logon) from the same source to multiple destinations is a pattern consistent with credential spraying or enumeration before successful compromise.
Detecting Proxy and Tunneling Infrastructure
The SystemBC proxy tool, highlighted in recent DFIR reporting, operates by establishing encrypted SOCKS proxy connections that allow attackers to tunnel additional malicious traffic through a compromised host. From a log perspective, SystemBC generates outbound connections on non-standard ports or on ports used legitimately by other protocols, making it harder to distinguish from background noise.
Detection requires combining multiple data sources. Firewall or netflow logs showing persistent, low-bandwidth connections from internal hosts to external IPs on ports like 4001, 4444, or high-numbered ephemeral ports are worth investigating, particularly when those connections maintain a consistent beaconing interval. Beacon detection works by measuring the standard deviation of time intervals between connections from the same source to the same destination. Legitimate applications that call home periodically, such as software update services, have predictable intervals. Malware beacons often have similarly low variance, but to destinations that have no relationship to any known application in your environment.
DNS is the other detection surface for proxy infrastructure. Domains used as command and control frequently have high entropy names generated by domain generation algorithms, very recent registration dates, or hosting on infrastructure with no business relationship to your organization. DNS query logs enriched with passive DNS reputation data allow analysts to surface resolutions to newly-registered domains or domains associated with hosting providers frequently used for malicious infrastructure.
Ransomware Pre-Staging Indicators
Ransomware attacks rarely start with encryption. The current wave of ransomware incidents involves a reconnaissance and staging phase that can last days or weeks before the destructive payload executes. This staging phase leaves log evidence that is detectable if analysts know what to look for.
Staging activity typically includes large-volume file access events as attackers enumerate file shares to identify data worth exfiltrating before encryption. Windows file server audit logs capture object access events (Event IDs 4663 and 4656) that show which files were read and by which account. A service account or user account reading thousands of files across multiple network shares over a few hours is anomalous. Most users access a small, relatively consistent set of files each day. Dramatic deviations from baseline access patterns are a meaningful signal.
Volume shadow copy deletion is another late-stage indicator. The command vssadmin delete shadows or wmic shadowcopy delete executed on a host appears in process creation logs and should generate an immediate alert. This action has almost no legitimate use case in a production environment and is a near-universal ransomware pre-deployment step.
Archiving and compression tools running from unusual contexts also matter. Seeing 7-Zip or WinRAR spawned as a child process of a remote management tool, or executed from a staging directory, particularly when followed by outbound data transfers, is consistent with the exfiltration phase documented in recent ransomware attack chains.
Structuring Your SIEM for Faster Triage
Raw detection capability means little without an operational workflow that routes alerts to analysts who can act on them quickly. The gap between detection and response is where attackers exploit dwell time, and current reporting suggests median dwell time for ransomware groups is still measured in days rather than hours for many organizations.
Alert Fidelity and Tier Assignment
Every alert in a SIEM should have an associated severity tier, an expected analyst response time, and documented triage steps. Without these, analysts treat all alerts with the same urgency, which in practice means most alerts get triaged slowly or not at all.
Tier 1 alerts are high-confidence, high-severity detections that require immediate response regardless of the time of day. Examples include volume shadow copy deletion, successful authentication from a known threat actor IP, or a process executing from a path commonly used by malware staging. These alerts should page on-call staff and have a response SLA measured in minutes.
Tier 2 alerts are medium-confidence detections that require analyst investigation within a defined window. Examples include authentication anomalies, unusual outbound connections, or process behavior that matches known techniques but could have legitimate explanations in some environments. Analysts need documented runbooks that tell them what additional evidence to gather and what conditions escalate a Tier 2 to a Tier 1.
Tier 3 alerts are informational detections used for context and hunting rather than immediate response. They inform threat hunting campaigns and help analysts correlate Tier 2 alerts with broader patterns. Without this tier, potentially valuable intelligence gets discarded.
Enrichment as a First-Class Operation
Raw log data lacks context. An IP address is meaningless without knowing whether it belongs to a known hosting provider, a Tor exit node, a VPN service, or an ISP associated with your organization's legitimate users. Enriching alerts automatically before they reach analysts dramatically improves triage speed and accuracy.
Integrate your SIEM with threat intelligence feeds that provide IP and domain reputation data. AbuseIPDB, GreyNoise, and commercial threat intelligence platforms all offer APIs that can be called at alert time to check whether observed indicators have been reported in other security incidents. A connection to an IP that has been reported by dozens of organizations in the past 30 days changes the urgency of an alert even if the destination is not on your internal blocklist.
Asset context is equally important. Knowing that a particular host is a developer workstation, a production database server, or a finance team endpoint changes how analysts interpret identical log events. Maintain an asset inventory that your SIEM can query, and surface that context in every alert. A process execution that is normal on a developer workstation running custom tooling is alarming on an accounting workstation that should run a predictable set of standard applications.
Phishing-Initiated Attack Chains in Log Data
Recent reporting on attackers weaponizing Amazon SES to send phishing emails that bypass traditional email filters illustrates how modern attack chains begin. When phishing emails arrive via a legitimate sending service, email gateway logs show clean delivery. The first log evidence of compromise often appears at the endpoint when a user executes a malicious attachment or follows a link to a credential harvesting page.
Process creation logs showing Office applications spawning unusual child processes are a well-established detection. Specifically, WINWORD.EXE or EXCEL.EXE spawning powershell.exe, wscript.exe, or mshta.exe is consistent with macro-based malware execution. Web proxy logs showing a workstation visiting a newly-registered domain immediately after a user opens an email attachment provide corroborating evidence that can accelerate triage.
Browser-based credential theft from fake login pages targeting services like Microsoft 365 appears in proxy logs as a POST request to a domain that spoofs a legitimate service. Detection rules that flag POST requests to domains with high character similarity to high-value authentication endpoints, or to domains registered within the past 30 days, catch a meaningful portion of this activity. The false positive rate is manageable when combined with user context: a POST request to a spoofed domain from an account that received an external email in the past two hours is significantly more suspicious than the same request from an isolated endpoint with no recent mail activity.
Log Retention and Legal Considerations
Detection capability depends on having logs available to query. Many organizations configure retention periods that are too short to support meaningful investigation. Attackers who maintain dwell time of two to three weeks before executing a ransomware payload will have activity that falls outside a 14-day retention window, making forensic reconstruction difficult or impossible.
A practical minimum retention policy separates high-volume, low-fidelity logs like raw netflow from lower-volume, high-fidelity logs like authentication and endpoint telemetry. Retain authentication, endpoint, and DNS logs for at least 90 days in hot storage where they are immediately queryable. Retain compressed copies of all log sources for at least 12 months in cold storage for forensic and legal purposes. Regulatory requirements in sectors like healthcare and finance often mandate specific retention periods, and aligning security retention to those requirements simplifies compliance posture.
Log integrity is a separate consideration from retention. If an attacker gains administrative access to your log management infrastructure, they can delete or modify records. Store logs in a write-once or append-only architecture where possible. Send critical security logs to a separate logging environment that operational staff do not have access to modify. This architecture ensures that even a fully compromised production environment cannot erase the evidence of the compromise.
Building a Threat Hunting Practice on Top of Log Analysis
Reactive detection through alert rules catches known-bad behaviors. Threat hunting is the proactive search for attacker activity that has not yet triggered an alert, often because the behavior is subtle enough to fall below detection thresholds or novel enough that no rule exists for it yet.
Hunting starts with a hypothesis drawn from threat intelligence. The recent Gentlemen and SystemBC reporting provides a concrete starting point: assuming that a proxy tool is present on at least one endpoint in your environment, what log evidence would it leave? Generate the expected artifacts: outbound connections to hosting providers not seen before in your environment, processes writing files to staging directories, parent-child process relationships that do not match known legitimate software, and DNS queries to domains with no established history in your environment.
Query your log data looking for evidence of those artifacts. A hunt that comes back empty is still valuable because it demonstrates coverage and tests your log collection. A hunt that surfaces unexpected findings leads directly into an incident response workflow.
Document every hunt, including the hypothesis, the queries used, the data sources queried, and the results. Over time, this documentation becomes a library of hunting playbooks that can be repeated on a scheduled basis and assigned to analysts of varying experience levels. Hunting should not be an activity reserved for senior staff only. Well-documented playbooks allow junior analysts to execute hunts under supervision and develop detection intuition from the process.
Operational Tradeoffs in Log Analysis Architecture
Every architectural decision in a log analysis program involves tradeoffs between cost, performance, coverage, and operational complexity. Teams that treat these tradeoffs as fixed ignore the reality that the right architecture for a 50-person company is meaningfully different from what a 5,000-person enterprise needs.
Centralized SIEM platforms provide unified query capability and alert management but generate significant cost at scale. Log storage costs for high-volume sources like netflow can be substantial. Some organizations address this by routing high-volume, low-fidelity sources to cheaper data lake storage while keeping high-fidelity sources in the SIEM. Queries that require joining data across both systems are slower and more complex, but the cost savings for large environments are significant.
Cloud-native environments introduce additional complexity. Workloads that autoscale generate log volume that fluctuates dramatically. Log pipelines need to handle burst conditions without dropping events. Services like AWS CloudTrail, GCP Cloud Audit Logs, and Azure Monitor generate high-value security telemetry, but they need explicit configuration to capture the events relevant to security monitoring. Default configurations frequently omit data access events and management plane activity that would be central to investigating a cloud compromise.
The P2P botnet tracking work published in recent research demonstrates another complexity: distributed threats generate distributed log evidence. A P2P botnet node communicates with many peers rather than a central server, which means that any single network flow log shows only a slice of the infection's communication pattern. Effective detection requires aggregating and correlating flow data across many observation points rather than relying on a single sensor.
Concrete Steps to Improve Starting This Week
Improving log analysis capability does not require a platform migration or a budget cycle. Several concrete actions can be taken immediately with existing tooling.
- Enable process creation logging with command-line arguments on Windows endpoints. This requires a Group Policy change to enable Event ID 4688 with command-line auditing, or deploying Sysmon if it is not already present.
- Audit DNS log collection. Verify that internal DNS resolver query logs are ingested into your SIEM or log management platform, and confirm that the source IP in those logs represents the querying workstation rather than the resolver itself.
- Create a detection rule for volume shadow copy deletion using the specific command strings associated with the
vssadminandwmicmethods. Test the rule in a lab environment to confirm it fires before deploying to production. - Set a 90-day retention minimum for authentication logs and run a query against your oldest available data to confirm the retention policy is functioning as configured.
- Identify one threat intelligence feed with an API and integrate it into your alert enrichment pipeline so that observed external IPs are automatically checked for reputation at alert time.
- Run a simple lateral movement hunt by querying authentication logs for accounts generating Type 3 logons to more than five distinct hosts in a single hour over the past 30 days. Investigate anything that surfaces.
Each of these actions builds on the others. Better log collection feeds better detection rules. Better enrichment accelerates triage. Better retention enables retroactive hunting when new threat intelligence surfaces indicators that were active in your environment weeks ago. The compound effect of consistent improvement in each area is a detection program that catches attacks at earlier stages with less analyst effort per incident.
The logistics company from the opening scenario eventually rebuilt its detection program after the breach. The logs were already there. The gap was in the structured analysis, the enrichment, and the operational workflow that would have connected the dots before the ransomware executed. Closing that gap is the real work of log analysis for threat detection, and it is available to any team willing to invest the time in doing it systematically.