In the weeks before a major international soccer tournament, a wave of fake FIFA merchandise and ticket-sales sites surfaced across the web. Security researchers flagged dozens of domains that had cleared enterprise URL filtering, passed through email gateways, and landed in inboxes at large organizations without triggering a single alert. The domains were freshly registered, used valid TLS certificates, loaded clean-looking HTML, and resolved from IP ranges with no prior abuse history. Every heuristic that most teams rely on said these URLs were safe. Users clicked, credentials were harvested, and payment card data was exfiltrated before the first IOC appeared on any public threat feed.
This scenario is not exceptional. It is the standard operating environment for phishing in 2026. Threat actors understand exactly how commercial URL filters work, and they build campaigns specifically designed to route around them. Understanding where detection breaks down, and how to layer techniques that catch what filters miss, is one of the most operationally valuable skills a security team can develop right now.
Why Single-Layer URL Filtering Keeps Failing
Most organizations depend on reputation-based URL filtering as their primary phishing defense. These systems maintain databases of known-bad domains and IP addresses and block requests that match. The model works well against established, widely-reported phishing infrastructure, but it has a structural blind spot: it is entirely reactive. A domain has to be observed, reported, analyzed, and categorized before it generates a block. The average time between domain registration and first appearance on a public blocklist is measured in hours at best, and often in days.
Attackers exploit this gap deliberately. Fresh domains registered through privacy-protecting registrars, hosted on cloud infrastructure with clean IP history, and equipped with Let's Encrypt TLS certificates present no signal for reputation systems to act on. The fake FIFA ticket sites followed this exact pattern. Some used typosquatted domains that were plausible enough to pass casual inspection. Others used subdomain structures on otherwise-legitimate hosting platforms, which added another layer of apparent credibility.
The deeper problem is that reputation filtering creates a false sense of coverage. When IT administrators see high block rates in their URL filtering dashboards, it is easy to conclude that the control is working well. Those block rates mostly reflect traffic to domains that are weeks or months old. The sites that cause actual breaches are the ones that have never been seen before.
Lexical Analysis: Reading the URL Itself
Before a URL is fetched, its text carries significant signal. Lexical analysis examines the structure, character composition, and entropy of a URL without making any network request. This makes it fast, scalable, and useful as a first-pass filter in high-volume environments.
Several features consistently distinguish phishing URLs from legitimate ones. Domain length is one of the simplest: phishing domains tend to be longer than legitimate ones because attackers embed keywords to add apparent credibility. A domain like secure-login-fifatickets-official-checkout.com scores immediately on length alone. Character-level entropy, specifically the presence of random-looking strings, also correlates with algorithmically generated domains used in phishing and malware infrastructure.
Subdomain depth is another strong signal. Legitimate organizations rarely use more than two or three subdomain levels in external-facing URLs. Phishing sites frequently use structures like login.secure.account.verify.example.com to push a recognizable brand name toward the end of the hostname where users are more likely to focus their attention. Detecting this pattern requires parsing the full hostname structure, something many filters do not do by default.
Keyword presence analysis looks for high-value terms that phishers use to lend urgency or credibility: words like secure, verify, account, login, update, confirm, official, ticket, and checkout. When these appear in domains rather than in URL paths, they carry stronger phishing signal because legitimate services rarely embed these terms in their apex domain names. The implementation challenge is managing false positives, since some legitimate services do use these terms. Combining keyword presence with other features, rather than treating it as a standalone block condition, keeps precision acceptable.
Path and query string analysis extends lexical detection beyond the hostname. Extremely long query strings, base64-encoded parameters, and deeply nested directory paths all appear more frequently in phishing URLs than in legitimate ones. A URL whose query string is 400 characters of base64 deserves scrutiny even if the domain itself looks clean.
Domain Registration Intelligence
Domain age is one of the most reliable phishing signals available, and it is one of the least exploited by smaller security teams. The overwhelming majority of phishing domains are used within the first 30 days of registration. Some are burned within hours of going live. Building domain age lookups into your detection pipeline is straightforward if you have access to a WHOIS or passive DNS data source.
Registration metadata reveals additional signals. Domains registered through privacy-protection services represent a normal baseline for legitimate sites, but the combination of privacy protection, registration through a registrar with a high abuse rate, and a creation date within the past week is a meaningful cluster of risk. Some registrars are significantly overrepresented in phishing infrastructure, and maintaining awareness of which registrars appear repeatedly in your incident data is worthwhile.
Newly Observed Domains (NOD) feeds are available from several threat intelligence providers and offer a practical mechanism for flagging URLs whose apex domains have not previously appeared in DNS telemetry. Routing traffic destined for NOD-flagged domains through an additional inspection layer, rather than blocking outright, is a reasonable policy that catches a large proportion of phishing attempts while limiting disruption to legitimate new services.
Certificate Transparency logs provide another registration-time signal. When an attacker registers a domain and immediately obtains a TLS certificate, that certificate appears in CT logs. Monitoring CT logs for domains that match your organization's brand names, executive names, or product terminology gives you near-real-time awareness of typosquatting and impersonation infrastructure, often before it is used in an actual campaign.
Content-Based Detection and Page Rendering Analysis
Lexical and registration signals tell you something is suspicious. Content analysis tells you what the page is actually doing. Fetching and rendering suspicious URLs in an isolated environment, then analyzing the resulting content, is the most accurate detection method available and also the most resource-intensive.
Visual similarity analysis compares the rendered appearance of a suspicious page against a library of legitimate brand pages. Attackers frequently clone the HTML, CSS, and images of well-known services to make credential harvesting pages convincing. Perceptual hashing algorithms like pHash compute a compact fingerprint of a rendered page screenshot and allow rapid comparison against stored fingerprints of known-legitimate pages. A high visual similarity score between a newly observed URL and, for example, a bank login page is a strong phishing indicator even if every other signal looks clean.
HTML structure analysis looks at the page's form elements, link destinations, and JavaScript behavior. Phishing pages typically contain login forms that submit to third-party domains or to URL-encoded data destinations. Checking whether a form's action attribute resolves to the same domain as the page itself is a simple but effective heuristic. Pages that import resources from many unrelated domains, or that contain obfuscated JavaScript, warrant additional scrutiny.
Brand logo and keyword detection within page content adds another layer. A page that contains the Nike or PayPal logo but resolves to a domain with no affiliation to those organizations is an obvious mismatch. OCR-based text extraction from rendered screenshots can surface brand names that are embedded in images rather than HTML text, which is a technique some phishing kits use to evade text-based scanners.
The operational challenge with content-based analysis is latency. Fetching, rendering, and analyzing a page takes several seconds at minimum, which is too slow for inline web gateway inspection in most environments. The practical approach is to use content analysis asynchronously: flag URLs based on faster signals, queue flagged URLs for content inspection, and update your filter's assessment of those URLs as results come back. For email links, where the click may happen minutes or hours after delivery, this pipeline has enough time to reach a verdict before most users act on the message.
Machine Learning Approaches in Production
Machine learning models trained on URL features have become a standard component of commercial phishing detection products, and the underlying techniques are accessible enough that security teams with data science capability can build and deploy them internally. The value of ML in this context is its ability to combine dozens of weak signals into a reliable composite score, handling cases where no single heuristic would reach a threshold for action.
Feature engineering for URL classification draws on all the signal categories already described: lexical features, registration metadata, DNS behavior, IP reputation, and content features when available. Random forest classifiers and gradient boosting models perform well on this feature set and are interpretable enough that analysts can understand why a particular URL received a high risk score. Neural approaches, including character-level convolutional models that treat URLs as raw character sequences, capture structural patterns without requiring manual feature engineering and generalize well to previously unseen attack patterns.
The practical limitation of ML-based detection is model drift. Phishing techniques evolve continuously, and a model trained six months ago may perform significantly worse on today's campaigns. Gremlin Stealer's recently documented evolution, which involved hiding malicious payloads inside resource files to evade static analysis tools, illustrates how quickly evasion techniques advance. The same adaptive pressure applies to phishing infrastructure. Models need regular retraining against current campaign data, and teams need monitoring in place to detect when model performance degrades in production.
Training data quality is the other critical constraint. Models trained on unbalanced datasets, with far more benign URLs than phishing examples, tend toward false negatives in production. Models trained on stale phishing examples learn to recognize last year's campaigns rather than current ones. Maintaining a continuous pipeline that feeds fresh phishing URLs from threat intelligence sources, honeypot data, and user-reported suspicious links into the training set is operationally demanding but essential for sustained model performance.
Behavioral Signals and Redirect Chain Analysis
Modern phishing campaigns rarely send users directly to the credential harvesting page. Redirect chains serve multiple purposes: they route around URL filters that check only the initially clicked link, they add apparent legitimacy by passing traffic through known-good intermediate services, and they make attribution harder by inserting additional hops between the lure and the payload.
A URL that passes through a legitimate link-shortening service, then redirects to a compromised WordPress site, then redirects again to the actual phishing page will clear most static URL filters because the initial link points to a domain with an excellent reputation. Following the full redirect chain at inspection time, rather than evaluating only the submitted URL, is necessary to catch these campaigns. This requires a sandboxed HTTP client that follows redirects across domains, records each hop, and evaluates every destination in the chain.
Redirect chain analysis also surfaces the use of open redirects on legitimate domains, a technique where attackers append a malicious destination URL as a parameter to a redirect endpoint on a trusted site. These are common in phishing campaigns targeting corporate users because the email link appears to point to a recognizable domain. Detecting open redirect abuse requires parsing redirect destinations and evaluating them independently rather than trusting the source domain.
User behavior analytics adds a post-click dimension to phishing detection. When a user clicks a link and the subsequent browsing session includes credential submission, rapid navigation away, or connection to domains associated with data exfiltration infrastructure, those behavioral signals can trigger incident response workflows even after the filter failed at initial inspection. Correlating endpoint telemetry with URL inspection events gives detection teams a second opportunity to catch phishing that cleared the perimeter.
Operationalizing Detection Across the Email and Web Surface
Detection techniques only generate value when they are integrated into the environments where phishing actually reaches users. Email and web browsing are the two primary delivery channels, and each has distinct integration requirements.
For email, URL inspection should occur both at delivery time and at click time. Delivery-time inspection runs every link through the available detection pipeline before the message is placed in the inbox. Click-time inspection, implemented through URL rewriting that routes clicks through a proxy inspection service, provides a second evaluation at the moment the user attempts to follow the link. This matters because some phishing sites are configured to serve benign content during the period when security tools are likely to inspect them, then switch to malicious content once real user traffic begins. Click-time inspection with fresh evaluation catches sites that attempt this kind of cloaking.
For web browsing, browser-based telemetry and DNS-layer filtering complement gateway inspection. DNS-layer filtering evaluates domain resolution requests against threat intelligence before any HTTP traffic is generated, providing fast, low-overhead blocking for known-bad infrastructure. Browser extensions or endpoint agents that perform client-side URL analysis can evaluate links in real time as users hover or navigate, adding a detection layer that operates independent of gateway visibility.
Security awareness training remains a necessary complement to technical controls. Users who understand how to read a full URL, recognize subdomain manipulation, and report suspicious links add meaningful detection capacity to the organization. The limiting factor is that even well-trained users make mistakes under time pressure or when confronted with convincing social engineering. Technical controls need to be layered specifically because human review is not a reliable last line of defense.
Handling the Cloaking Problem
Cloaking is the practice of serving different content to scanners than to real users. It is widespread in sophisticated phishing campaigns and represents one of the hardest problems in URL detection. A cloaked phishing site will return a blank page, a redirect to a legitimate site, or an error message when accessed from known scanner IP ranges, then serve the credential harvesting page to users arriving through normal browser sessions.
Defeating cloaking requires inspection infrastructure that resembles real user traffic as closely as possible. This means using residential or mobile proxy infrastructure for URL fetching rather than datacenter IP ranges that are easy to fingerprint. It means sending realistic browser headers, executing JavaScript, and respecting cookies in the same way a real browser session would. It means introducing realistic timing between requests rather than making simultaneous parallel requests that are characteristic of automated scanning.
Some detection platforms rotate their inspection infrastructure IP addresses frequently to prevent blocklisting. Even with these measures, determined attackers can often distinguish scanner traffic from real user traffic through subtle behavioral signals. The practical implication is that cloaking makes content-based detection probabilistic rather than certain, which reinforces the importance of using lexical, registration, and behavioral signals as primary detection layers rather than relying on content inspection to catch everything.
Integrating Threat Intelligence Into the Detection Pipeline
Commercial and open-source threat intelligence feeds provide lists of known phishing domains, URLs, and associated infrastructure. Integrating these feeds into URL filtering, DNS resolvers, and SIEM correlation rules adds detection coverage for campaigns that have been previously observed and documented. The caveat is that feed coverage for brand-new phishing infrastructure is limited during the critical window when active campaigns are targeting users.
Sharing intelligence bidirectionally with the threat intelligence community accelerates coverage for everyone. When your team identifies a phishing domain that is not yet on any public feed, submitting it to abuse reporting channels, phishing databases like PhishTank or OpenPhish, and your threat intelligence sharing communities reduces the window during which that domain affects other organizations. This contribution model only works if it is operationalized, meaning there is a defined workflow for submitting IOCs, not just a general intention to do so.
Internal threat intelligence derived from your own environment is often the fastest source of signal for attacks targeting your specific organization or industry. Tracking domains that closely resemble your brand, monitoring for certificate registrations that match your naming conventions, and correlating user-reported phishing submissions against broader campaign patterns all generate organization-specific intelligence that commercial feeds may not carry.
Measuring What Your Detection Actually Catches
Detection programs that are built and then left without performance measurement tend to degrade over time without anyone noticing. Establishing metrics for phishing URL detection is operationally necessary, not just administratively useful.
False negative rate, the proportion of phishing URLs that clear your filters and reach users, is the most important metric and also the hardest to measure directly because you do not know what you are missing. Proxy indicators include the rate of user-reported phishing submissions that were not blocked, the frequency with which threat intelligence feeds surface URLs that appeared in your logs without generating blocks, and the outcomes of red team or phishing simulation exercises that use campaign techniques matching current attacker behavior.
False positive rate matters for user experience and helpdesk load. A detection pipeline that blocks significant volumes of legitimate traffic will generate pressure to loosen thresholds, which degrades protection. Tracking blocked URL complaints, measuring the proportion of blocked URLs that are subsequently whitelisted, and auditing whitelist decisions to ensure they are not creating coverage gaps all contribute to maintaining a well-calibrated system.
Regular testing against current phishing kit samples and live campaign infrastructure provides the most realistic performance measurement. Security teams with threat intelligence access can obtain samples of active phishing URLs and run them through their detection stack in a controlled environment to assess what gets caught and what does not. The results should drive tuning decisions rather than sit in a report.
Practical Priorities for Teams With Limited Resources
Not every organization can deploy sandboxed rendering infrastructure, maintain ML models, and subscribe to premium threat intelligence feeds simultaneously. Prioritizing investment based on what generates the most detection value per unit of effort is essential for teams operating under resource constraints.
Domain age monitoring and CT log watching are high-value, relatively low-cost additions to most environments. Both can be implemented using open-source tooling and free or inexpensive data sources. They catch a disproportionate share of phishing campaigns because most attacks rely on freshly registered infrastructure.
Email link rewriting with click-time inspection is available in most modern email security platforms and should be enabled if it is not already. The incremental cost is low and the detection benefit, particularly for cloaked sites that behave differently at click time than at delivery time, is significant.
User reporting pipelines, where suspicious links submitted by users flow into an analyst review queue and then into detection systems if confirmed malicious, provide detection coverage that no automated system fully replicates. Users encounter social engineering in contexts, such as personal email viewed on corporate devices or messaging platforms outside standard monitoring scope, that technical controls may not reach. Making it frictionless to report suspicious URLs, and ensuring reports receive timely action, turns user suspicion into organizational intelligence.
As AI-driven attack generation becomes more capable, with AI agents now being used to craft contextually convincing phishing lures at scale, the detection surface will continue to expand. The campaigns that caused the most damage historically were often characterized by high volume and low sophistication. The campaigns emerging now combine high volume with high sophistication, specifically because AI tooling reduces the effort required to personalize and refine attack content. Detection infrastructure built for yesterday's campaigns will need continuous adaptation to remain effective against what is already in production today.