The Uncomfortable Truth About Log Analysis
Most security teams treat log analysis as a confirmation tool. An alert fires, an analyst pulls the logs, and the investigation begins. This approach gets the causality backwards. Logs are not a post-alert resource. They are the earliest, most reliable signal of adversary activity available to any defender, and the organizations that use them reactively are systematically late to every incident they eventually discover.
The shift required is not technical. It is philosophical. Logs must be treated as a continuous intelligence stream, not an archive to consult after something has already gone wrong. The teams that internalize this are the ones catching lateral movement before ransomware deploys, identifying command-and-control beaconing before data leaves the environment, and spotting credential abuse before an attacker reaches production systems.
Given the current threat landscape, the stakes for getting this right have never been higher. The recent identification of VECT ransomware, which functions as a wiper under specific conditions, illustrates how a single threat can shift behavior mid-incident. If your log pipeline only captures file system changes at the point of encryption, you have already lost the window where early log signals could have stopped the attack chain entirely.
Why Log Pipelines Fail Before Analysis Even Starts
The most common failure in log-based threat detection has nothing to do with detection logic. It is a collection and normalization problem. Organizations accumulate dozens of log sources across firewalls, endpoints, authentication systems, cloud providers, and network devices, but they rarely audit whether those sources are actually delivering complete, timely, and parseable data to their SIEM.
Silent failures are endemic. A Windows endpoint stops forwarding event logs after an update. A misconfigured syslog daemon on a Linux server drops UDP packets under load. A cloud provider rotates an API key and the log ingestion job silently stops. In each case, the SIEM continues to function without complaint while a category of visibility disappears entirely. The analyst dashboard shows no alerts, but only because there are no logs to analyze.
A practical starting point is a log source inventory with health monitoring. Every source in your environment should have an expected ingestion rate. A source that falls below its baseline by more than a defined threshold should generate a high-priority operational alert. This sounds obvious, but the majority of organizations running mature SIEM deployments do not have this in place. They discover gaps during breach investigations, not before them.
Normalization is the second structural problem. When logs from different sources use different field names for the same concept, writing detection rules that span multiple sources becomes an exercise in frustration. A source IP address might appear as src_ip, source.ip, client_ip, or RemoteAddress depending on the vendor. Investing time in a consistent schema, whether that is the Elastic Common Schema, OCSF, or a custom standard, pays dividends every time an analyst writes a correlation rule or a threat hunter runs a multi-source query.
The Log Sources That Actually Matter for Threat Detection
Not all logs carry equal intelligence value. Teams that try to collect everything often end up drowning in noise while the signals that matter get buried. Prioritization based on attacker behavior, not compliance requirements, produces better detection outcomes.
Authentication Logs
Authentication events are the highest-value source for detecting credential-based attacks. Failed logins, successful logins from unusual locations or times, account lockouts, password resets, and multi-factor authentication bypass events all map directly to initial access and persistence tactics. The recent exposure of REvil and GandCrab leadership through German law enforcement, including the identification of a figure known as UNKN, illustrates how ransomware operations depend on authenticated access to victim environments. The authentication logs are where that access first appears.
For Active Directory environments, Event ID 4624 (successful logon), 4625 (failed logon), 4768 and 4769 (Kerberos ticket requests), and 4776 (credential validation) form the core detection surface. Pay particular attention to logon type 3 (network logon) and type 10 (remote interactive logon), which are the types most commonly associated with lateral movement and remote access tool activity.
Process Execution and Command-Line Logs
Windows Event ID 4688 with command-line auditing enabled, and Sysmon Event ID 1, capture process creation with full command-line arguments. This data is critical for detecting living-off-the-land techniques where attackers use legitimate system binaries to execute malicious actions. PowerShell encoded command execution, WMI-based lateral movement, and scheduled task creation all leave distinct signatures in process logs that are invisible to signature-based endpoint tools if those tools are only looking at binary hashes.
The DFIR Report's analysis of SystemBC campaigns demonstrates this directly. SystemBC uses legitimate network protocols and system processes to establish persistent proxy tunnels. Detection relies on correlating process creation events with network connection logs to identify processes making unexpected outbound connections to unusual destinations. Neither data source alone is sufficient. The detection lives in the correlation.
Network Connection Logs
DNS query logs, proxy logs, firewall flow records, and NetFlow data each provide a different view of network behavior. DNS logs in particular are underutilized. Domain generation algorithm (DGA) traffic, DNS tunneling, and lookups to newly registered or algorithmically suspicious domains are all detectable through DNS analysis without needing to decrypt traffic. High-volume queries to a single domain, queries with unusually long subdomains, or consistent low-TTL beacon patterns to external infrastructure are all indicators worth building detection logic around.
Proxy logs provide visibility into the content of web traffic that firewall flow records miss. When Chinese surveillance camera access is sold on criminal markets, as has been documented in recent threat intelligence reporting, the camera traffic traverses the network and shows up in flow data. Knowing what normal outbound traffic looks like from those devices, and alerting when that baseline changes, is a detection approach that relies entirely on log analysis rather than signature matching.
Cloud and SaaS Audit Logs
AWS CloudTrail, Azure Activity Logs, Google Cloud Audit Logs, and the audit logs from major SaaS platforms like Microsoft 365 and Google Workspace are increasingly where the action is. Attackers follow users to where the data lives. An attacker who compromises an employee's credentials does not need to touch on-premises infrastructure if the sensitive data is in SharePoint Online or an S3 bucket.
Key events to monitor include unusual API calls in cloud provider logs, especially those involving IAM modifications, new user creation, or changes to security group rules. In SaaS platforms, impossible travel events, mass download activity, and forwarding rule creation on email accounts are high-fidelity indicators of account compromise. The warning about data breach alert phishing campaigns is relevant here. Attackers who know that an organization uses automated breach notification tools will craft lures that mimic those alerts to steal cloud credentials, making the cloud audit logs the place where the resulting unauthorized access will first appear.
Building Detection Logic That Survives Contact with Real Attackers
Detection rules written in isolation from actual attacker behavior age poorly. Rules that generate too many false positives get disabled. Rules that are too narrow miss novel attack patterns. The goal is detection logic that is specific enough to be actionable but resilient enough to catch behavioral variations.
Threshold-Based Detection
Threshold detection counts events over a time window and fires an alert when the count exceeds a limit. Authentication failures are the canonical example. Five failed logins from a single source in one minute is a reasonable threshold for a brute force alert. The challenge is tuning thresholds to the environment. A service account that runs an automated job may legitimately fail authentication under specific conditions. A threshold rule without exception handling for known service accounts will generate chronic false positives that train analysts to ignore the alert category.
Adaptive thresholds, where the baseline is computed from historical behavior for each user or source rather than a fixed global value, significantly reduce false positive rates. A user who has never logged in at 3 AM triggering an alert is more meaningful than a fixed rule that fires on any login between midnight and 5 AM regardless of whether that user routinely works late.
Sequence-Based Detection
Sequence detection looks for ordered chains of events that individually appear benign but collectively indicate malicious intent. A common example is the sequence of reconnaissance, credential access, and lateral movement that precedes ransomware deployment. Specifically: a single host making LDAP queries to enumerate Active Directory, followed by a network logon to multiple hosts using the same credentials, followed by the creation of a scheduled task or service on those hosts, is a high-confidence indicator of pre-ransomware staging activity.
Building sequence detection requires a SIEM or detection platform capable of correlating events across sources over time windows that span hours or days rather than minutes. The VECT ransomware case is instructive. The wiper functionality activates under conditions that the ransomware operator may not have intended, which means the pre-deployment activity looks identical to standard ransomware staging. Sequence-based detection that catches the staging behavior stops the attack before the payload variant matters.
Behavioral Baselines and Anomaly Detection
Behavioral baseline detection establishes normal patterns for users, hosts, and services, then alerts on deviations. A user who downloads 500 MB of data from a file server on a Tuesday afternoon when they have never previously accessed that server is anomalous regardless of whether the access used valid credentials. This approach catches insider threats and compromised account abuse that threshold and signature detection miss.
Building behavioral baselines requires sufficient historical data and a stable environment. Rapid organizational change, such as a merger, a migration to a new platform, or a shift to remote work, disrupts baselines and temporarily increases false positive rates. Teams need to anticipate these disruptions and adjust their detection strategy accordingly, either by suppressing anomaly alerts during transition periods or by segmenting baselines by user population to isolate the affected group.
The recent discovery of a nine-year-old Linux vulnerability through AI-assisted code scanning is a reminder that behavioral baseline detection can catch exploitation of obscure vulnerabilities that have no existing signatures. An old kernel bug being exploited in the wild will manifest as anomalous process behavior, unexpected privilege escalation events, or unusual system calls, all of which show up in logs before any signature is written.
Handling Deceptive and Weaponized Log Data
Attackers are aware that defenders analyze logs. Sophisticated threat actors deliberately manipulate the log environment to slow investigation and misdirect response. This is not a theoretical concern. It is documented attacker behavior in post-incident analysis.
Log clearing is the most common evasion technique. Windows Event ID 1102 records Security log clearing events, and Event ID 104 records System log clearing. Monitoring for these events and treating them as high-severity incidents regardless of context is a basic but effective control. A legitimate administrator clearing logs without a corresponding change management ticket is worth investigating. An attacker clearing logs after establishing persistence is trying to buy time.
Log injection is a more sophisticated technique where an attacker writes crafted entries to a log source to confuse parsing or introduce false indicators. If an attacker knows that your SIEM uses a regex to extract an IP address from a specific field in a web server log, they can craft a request that places a legitimate-looking IP address in that field to obscure the actual source of the request. Parsing validation and field-level integrity checks on ingested logs reduce exposure to this technique.
The warning about fake data breach alerts has a direct log analysis implication. If an attacker can convince an organization's security team that a specific IP address or domain is the source of a breach, they can manipulate the investigation to focus on a false lead while actual malicious activity continues undetected. Cross-validating log-derived indicators against multiple independent sources before acting on them reduces this risk.
Operationalizing Log Analysis at Scale
Detection logic is only as good as the process that surrounds it. A well-written detection rule that generates an alert no one investigates does not improve security posture. The operational scaffolding around log-based detection determines whether investment in detection engineering produces results.
Alert Triage Workflows
Every alert category should have a documented triage workflow that specifies what additional context an analyst needs to assess the alert, where that context comes from, and what the decision criteria are for escalating versus closing. Analysts who have to improvise during triage take longer, make more errors, and burn out faster. Documented workflows also make training new analysts faster and help ensure that institutional knowledge survives staff turnover.
For a failed authentication alert, the triage workflow might specify: pull the last 24 hours of authentication events for the affected user account, check whether the source IP has been seen before for this user, check whether the source IP appears in threat intelligence feeds, and look for concurrent activity from the account on other systems. Each step has a clear purpose and a clear next action based on the result.
Log Retention and Forensic Readiness
Threat actors with extended dwell times depend on log retention limits to cover their tracks. If firewall flow data is retained for 30 days and an attacker has been in the environment for 45 days, the initial access event is gone before the investigation begins. Retention requirements should be driven by realistic dwell time expectations and regulatory requirements, not storage cost optimization.
Immutable log storage, where log data is written to a destination that prevents modification or deletion, is a standard practice for forensic readiness. Cloud object storage with object lock policies, WORM-compliant storage appliances, or dedicated SIEM backends with append-only storage all satisfy this requirement. The additional cost is justified by the investigation value during an active incident and the evidentiary value if prosecution is a possible outcome.
Threat Hunting as a Log Analysis Discipline
Proactive threat hunting using log data bridges the gap between what automated detection covers and what sophisticated attackers are doing that detection has not yet been tuned to catch. The Libredtail threat covered in recent ISC diary posts illustrates the type of odd, low-volume behavior that automated detection frequently misses but that becomes obvious when an analyst deliberately looks for anomalous patterns in DNS and web request logs.
Structured threat hunts based on specific hypotheses produce the most consistent results. A hunt hypothesis might be: SystemBC proxy tunnels are present in this environment. The hunt then systematically queries process logs for SystemBC indicators, network logs for the specific traffic patterns associated with its C2 communication, and registry logs for the persistence mechanisms it typically uses. A hypothesis-driven approach ensures that hunts produce either a confirmed positive, a confirmed negative, or a gap identification, all of which are useful outcomes.
Practical Takeaways for Immediate Implementation
Organizations looking to improve their log-based threat detection posture should focus on a few high-impact actions rather than trying to overhaul everything simultaneously.
- Audit your current log collection to verify that every expected source is delivering data at its expected rate, and build alerting for sources that fall silent or significantly below their baseline.
- Enable command-line auditing for Windows process creation events if it is not already active. This single change dramatically improves detection coverage for living-off-the-land techniques with no additional tooling required.
- Add DNS query logging if your current pipeline does not include it. DNS is a high-value detection surface that remains underutilized in most environments.
- Build triage workflows for your top five alert categories. Document the specific log queries and data sources an analyst needs to assess each alert type, and measure triage time before and after to validate the improvement.
- Define your log retention policy based on dwell time and forensic requirements, and verify that at least authentication, process execution, and network connection logs meet a minimum 90-day retention threshold.
- Run one hypothesis-driven threat hunt per month using log data as the primary source. Start with high-confidence hypotheses based on current threat intelligence, and use the results to identify detection gaps.
Log analysis done well is not glamorous work. It requires consistent attention, careful tuning, and a willingness to question whether the absence of alerts reflects a secure environment or a broken detection pipeline. The teams that ask that question regularly are the ones who find the attacks that everyone else misses.