The Attack Pattern That Keeps Winning
Ransomware attacks are climbing again, and the defining characteristic of the most damaging incidents is how long threat actors sit inside networks before deploying encryption. Recent campaigns targeting healthcare, logistics, and municipal infrastructure show dwell times ranging from two weeks to several months. During that window, attackers are not loud. They blend into the rhythm of normal operations, and conventional signature-based detection misses most of what they do.
The recent wave of ransomware activity shares a structural pattern: initial access through credential abuse or exploitation of a known vulnerability, lateral movement using living-off-the-land binaries like psexec, wmic, and certutil, quiet exfiltration of sensitive data, and finally the encryption payload deployed when the attacker is confident they have reached enough systems. Each of those phases produces telemetry. The problem is that most teams have not trained their detection stack to treat that telemetry as a behavioral sequence rather than isolated events.
This is precisely where machine learning-based anomaly detection earns its place in the defensive stack. When deployed thoughtfully, with the right data inputs and clearly defined baselines, ML-based approaches surface the subtle behavioral deviations that precede encryption by days or weeks.
What Behavioral Anomaly Detection Actually Measures
Before covering implementation, it helps to be precise about what anomaly detection is measuring. The term gets applied loosely, covering everything from simple threshold alerts to deep learning models trained on packet captures. For network anomaly detection specifically, the most operationally useful approaches focus on three categories of signal.
Traffic Volume and Timing Deviations
Ransomware staging and lateral movement change the volume and timing of internal traffic in characteristic ways. A workstation that normally generates 200MB of internal traffic per day suddenly pushing 4GB across SMB shares is a deviation. An endpoint initiating connections to 40 internal hosts within a two-hour window when its historical baseline shows five or fewer unique internal destinations per day is a deviation. ML models trained on rolling baselines catch these without requiring static thresholds that generate constant false positives.
Protocol and Service Anomalies
Attackers using legitimate tools still leave protocol-level fingerprints. A host that has never used RDP suddenly initiating RDP sessions to multiple servers warrants scrutiny. DNS query volume spikes from a single endpoint, unusual LDAP enumeration activity, and unexpected SMB signing negotiation patterns all represent deviations from learned baselines. Models that track per-host protocol behavior over time flag these deviations even when the tools being used are native to the operating system.
Communication Graph Changes
Network communication patterns form a graph: which hosts talk to which other hosts, over which ports, at what frequency. ML models that map this graph and track changes over time are exceptionally good at detecting lateral movement. When a compromised host begins communicating with systems it has never previously contacted, that edge in the graph is new, and new edges in a mature network warrant investigation.
Why Signature Detection Fails at This Problem
Signature-based detection requires that the attacker's tools or traffic match a known pattern. The shift toward living-off-the-land techniques directly defeats this requirement. When an attacker uses certutil.exe to decode a payload or wmic.exe to execute commands remotely, those are legitimate Windows binaries. No signature flags them. The Russian attackers who weaponized a WinRAR vulnerability against Ukrainian organizations combined a known flaw with custom tooling specifically designed to avoid signature detection. By the time signatures for their specific payloads were distributed, the campaigns had already achieved their objectives.
The same dynamic applies to the MSI-based malware delivery technique that resurfaced recently. Abusing Windows Installer components to execute malicious code is not new, but each generation of the technique uses slightly different implementation details that invalidate existing signatures. Anomaly detection does not care about those implementation details. It cares about whether the behavior observed on the network deviates from what is normal for that environment.
Building the Data Foundation
The quality of an anomaly detection model depends entirely on the quality and completeness of its training data. This is where many deployments fail. Security teams deploy an ML-based NDR platform, run it for two weeks, and wonder why the baseline is producing noisy results. Two weeks is rarely enough time to capture a full behavioral cycle for most enterprise networks.
A proper baseline should cover at minimum four to six weeks of normal operations, spanning weekdays and weekends, end-of-month financial processing cycles if applicable, patch windows, and any regular scheduled tasks that produce unusual-looking but legitimate traffic patterns. Backup jobs, vulnerability scanner activity, asset management polling, and software deployment systems all generate traffic that will look anomalous if the model has not learned them as part of normal behavior.
Data Sources to Prioritize
- NetFlow or IPFIX records: These provide volume, timing, and communication graph data without requiring full packet capture. They scale well and are available from most enterprise switches and routers.
- DNS query logs: DNS is involved in almost every stage of an attack. Unusually high query volumes, queries for newly registered domains, and DGA-pattern domains all surface through DNS log analysis.
- DHCP logs: Correlating IP addresses to hostnames and MAC addresses over time is essential for making sure that anomalies are attributed correctly to specific devices rather than to IP addresses that may have changed hands.
- Authentication logs: Failed authentication attempts, credential reuse across multiple systems, and accounts authenticating outside their normal hours are behavioral anomalies that complement network-layer signals.
- Endpoint telemetry: Process-level data from EDR agents, correlated with network events, dramatically improves attribution and reduces false positives.
Choosing the Right Model Architecture
Network anomaly detection draws on several classes of ML models, each with different strengths. For most security operations teams, the practical choice is between unsupervised approaches, supervised approaches, and hybrid systems that combine both.
Unsupervised Anomaly Detection
Unsupervised models like autoencoders, Isolation Forest, and clustering algorithms learn what normal looks like from unlabeled traffic data and flag observations that deviate significantly from learned patterns. These are well-suited to environments where labeled attack data is scarce, which describes most enterprise security operations. The tradeoff is that unsupervised models require more tuning to suppress alerts on legitimate anomalies like authorized penetration testing or new application deployments.
Supervised Classification
Supervised models are trained on labeled datasets containing both normal traffic and traffic labeled as malicious. They perform well when the attack patterns in the training data match what the model will encounter in production. The limitation is that novel attack techniques that differ meaningfully from training examples will be missed. For established attack patterns like credential stuffing, port scanning, and known lateral movement techniques, supervised models deliver good precision.
Hybrid Approaches
Production-grade NDR platforms typically combine both approaches. Unsupervised models establish and continuously update behavioral baselines. Supervised classifiers handle known-pattern detection. Alerts from both streams are correlated by a rules layer that suppresses known-good anomalies and prioritizes events that appear in multiple data sources simultaneously. This correlation step is where the real reduction in alert volume happens, and it is the component that most custom deployments underinvest in.
A Real-World Detection Scenario
Consider a mid-sized logistics company running a hybrid environment with on-premises servers and cloud workloads. An attacker gains initial access through a phishing email that delivers a credential-harvesting payload. The credentials belong to a service desk account with elevated internal permissions. The attacker uses those credentials to authenticate via VPN during business hours to avoid timing-based alerts.
Over the following ten days, the attacker conducts internal reconnaissance using LDAP queries, maps network shares, and identifies backup servers. Each individual action, viewed in isolation, looks like normal administrative activity. Viewed as a sequence across ten days, the behavior is clearly anomalous: the service desk account has never previously run LDAP enumeration queries, has never accessed more than three file servers in a single week, and has never initiated connections to the backup management console.
An ML-based system tracking per-account communication graphs, protocol usage, and access patterns would flag the LDAP enumeration on day two as a deviation from that account's established baseline. The alert might be low severity on its own. By day five, when the same account is accessing an unusual number of file servers, the system correlates the two deviations and escalates. The backup server access on day eight triggers a high-severity alert because the account has no history of backup system interaction and because the pattern matches known pre-ransomware staging behavior in the model's supervised component.
This is the detection window that anomaly detection creates: days or weeks before the encryption payload deploys, when there is still time to contain the intrusion.
The Cloud Logging Evasion Problem
Recent research into attacker abuse of cloud logging services for defense evasion adds a complication that network-layer anomaly detection partially addresses. When attackers suppress or manipulate log sources, SIEM-based detection that depends on those logs goes blind. Network-layer telemetry collected at the infrastructure level, specifically NetFlow records captured at the network devices rather than from host agents, is much harder for an attacker to manipulate without triggering additional network-layer anomalies.
Teams should architect their anomaly detection pipeline so that at least some telemetry streams are collected at layers the attacker cannot reach with compromised endpoint privileges. Exporting flow data directly from managed switches and routers to an isolated collection system ensures that even if an attacker suppresses Windows event logs and tampers with EDR agents, the network-layer behavioral baseline remains intact.
Tuning for Operational Reality
The fastest way to destroy confidence in an anomaly detection system is to let it generate alert fatigue. When analysts receive 300 anomaly alerts per day and 290 of them turn out to be legitimate, they stop engaging with the alerts. Effective tuning requires a structured suppression strategy.
Start by identifying and labeling known-good anomalies during the baseline period. Vulnerability scanners, backup jobs, patch management systems, and monitoring agents all produce traffic that deviates from typical host behavior. Tag those sources explicitly and configure suppression rules so that their traffic does not contribute to anomaly scores for other hosts.
Implement confidence thresholds rather than binary alert triggers. A single anomalous event from a host with a long, stable baseline history should score differently than the same event from a host that was recently added to the network. Models that incorporate baseline age and stability into their confidence calculations produce more actionable alerts.
Schedule regular review cycles, monthly at minimum, where analysts examine the top anomaly sources and determine whether each represents a genuine detection gap, a suppression gap, or a model drift issue. Networks change over time, and models trained on six-month-old baselines will gradually accumulate false positives as legitimate behavior evolves.
Integration With the Broader Security Stack
Anomaly detection in isolation catches anomalies. Anomaly detection integrated with threat intelligence, SIEM correlation, and endpoint telemetry catches attackers. The distinction matters operationally.
When an anomaly detection system flags unusual outbound DNS queries from a workstation, that alert gains significant weight if the destination domain appears on a current threat intelligence feed. When the same workstation also shows a new process injection event in EDR telemetry, the correlated alert is high confidence and warrants immediate response rather than a ticket in the queue.
Most organizations already have the data sources needed for this correlation. The gap is the pipeline that joins them. Invest in the integration layer: make sure anomaly alerts carry enough context, specifically host identity, user account, application, and protocol, that analysts can immediately correlate them against other data sources without manual enrichment steps.
Practical Implementation Steps for Security Teams
- Audit your flow data collection: Verify that NetFlow or IPFIX records are being collected from all network segments, including cloud VPCs. Gaps in collection create blind spots that attackers will eventually find.
- Establish a baselining period: Run your anomaly detection platform in observation-only mode for four to six weeks before enabling alerting. Document known-good anomaly sources during this period.
- Build a suppression catalog: Maintain an explicit list of labeled known-good anomaly sources with the justification for each suppression. Review this catalog quarterly.
- Implement alert correlation: Configure your SIEM to correlate anomaly alerts with threat intelligence feeds and endpoint telemetry before presenting alerts to analysts.
- Define escalation tiers: Not every anomaly warrants the same response. Define confidence thresholds that map to response tiers, from auto-suppression for low-confidence single-source anomalies to immediate SOC engagement for high-confidence multi-source correlated events.
- Run tabletop exercises against your detection model: Simulate the pre-ransomware staging pattern described above and verify that your system generates correlated alerts before the simulated encryption phase. Adjust thresholds and suppression rules based on results.
- Protect your telemetry collection: Ensure that flow data collection systems are isolated from the endpoint privilege level that an attacker with compromised credentials could reach. This preserves visibility even when attackers attempt log suppression.
What to Expect From the Technology and What to Provide Yourself
ML-based anomaly detection platforms, whether commercial NDR products or open-source implementations built around tools like Zeek, Elastic ML, or RITA, will handle the statistical modeling, baseline maintenance, and scoring. What they cannot provide is the operational context that makes their alerts actionable.
That context comes from your team: the suppression catalog that reflects your environment's legitimate anomalies, the integration with your threat intelligence subscriptions, the escalation procedures that define what happens when a high-confidence alert fires at 2 AM, and the regular review cycles that keep the model aligned with how your network actually behaves as it evolves.
Ransomware groups and other sophisticated threat actors have demonstrated repeatedly that they understand how to move through networks in ways that avoid triggering conventional detection. Behavioral anomaly detection, implemented with realistic data requirements and operational discipline, closes a meaningful portion of that detection gap. The dwell time advantage that attackers currently enjoy depends on defenders treating each event in isolation. Systems and processes that connect events into behavioral sequences take that advantage away.