Blind URL Inspection | Catch Never-Before-Seen Phishing Emails

Over a third of the phish we stop each day involve malicious links. These are often missed by email gateways, DMARC and cloud email suites, which lack our proprietary computer learning approach: blind URL inspection. Hear from Area 1’s Umalatha Batchu (Lead Software Engineer), Javier Castro (Principal Security Researcher), Torsten Zeppenfeld (Sr. Software Engineer – ML) and Yen Chang (Software Engineering – ML), about how we accurately determine if a never-before-seen URL is malicious.

What’s the root cause of most cyber breaches? Phishing attacks.

At Area 1, we see and stop many thousands of daily phishing attacks against our customers. Over a third of the attacks that we stop use emails containing malicious links. These links lead victims to phishing sites custom-built to capture username and password information (so called “credential harvesters”) or to sites with malicious downloads, or both. All of these sites have criminal purposes and intend to perpetrate fraud against our customers.

Traditional security defenses (such as email gateways, cloud email suites, DMARC) often miss malicious links, whether in the body of an email or in a document within an email attachment. 

Why? Phishing web sites are easy to create, and in some cases are created automatically by attackers en masse. Emails are sent with links pointing to newly-created phishing sites, never before seen by a security vendor. So what does your security tool do when it sees this link for the first time? The answer, unfortunately, is nothing. 

Phishing Sites May Appear Completely Legitimate

Some examples of phishing sites that we see on any given day include pixel-perfect forgeries of well-known sites, such as LinkedIn, Microsoft and Chase

Yet not all phishing sites are as well crafted – in many cases very basic pages that purport to be internal IT systems can be extremely effective in a phishing campaign.

All of these sites have URLs that are “not quite right” when looking at the URL Address Bar in a desktop or phone browser. But, not every employee has the time or judgement to inspect URLs, especially when faced with a looming deadline or urgent task. It’s human nature to quickly prioritize tasks at hand, and URL authenticity can quickly fall to the bottom of the list during the course of normal business activities. Which led us to ask the question: “Can computers automatically do this for us, in a reliable way?”

How Blind URL Inspection Helps Catch Phishing 

To make accurate determinations on first-ever-seen URLs, Area 1 uses a proprietary computer learning approach that we call blind URL inspection.

Machine learning and artificial intelligence are overhyped terms in today’s technical environment. But hidden beneath the bold public claims is real progress in the field for recognition and pattern-matching problems. Sophisticated pattern-matching algorithms have migrated from academia into practical and applied scenarios. (Higher-level tasks, such as planning, decision making, and creativity are still active areas in the research community.)

Machine-learning algorithms for pattern matching are heavily dependent on large volumes of example data.  While human beings are able to learn from just a few examples, building an accurate machine-learning model requires very large volumes of training data.

Area 1 Security monitors a sensor network and web crawler that is able to observe hundreds of millions of URLs per day. In addition, we have a broad sampling of email traffic URLs, again amounting to millions of samples per day. This large volume of URL data allows us to train sophisticated machine learning models that have very high accuracy.

To implement blind URL inspection, we use neural networks – they are the most capable approach available for use today, given enough sample training data.  The goal of any machine learning model is to generalize from training samples –  to correctly categorize previously unseen examples that are similar enough to known examples to be matched accurately. Neural networks do the job well.

What are some of the patterns that Area 1’s neural networks recognize? 

Figure 1: Key URL Attributes That Make URLs Suspicious

All of these attributes are something that are clear to us as humans, but can evade our decision-making process if we have more important things to do.

Blind URL inspection is just one building block in our comprehensive anti-phishing solution. For more information about Area 1 Security and how we protect organizations from phishing, please visit our website or register for a demo.

Want to keep up to date with the latest phishing trends? 

Subscribe to our newsletter here!



Phil Syme

Cofounder and CTO at Area 1

Phil Syme has 22 years of information technology experience. His areas of expertise include system architecture, data processing at scale, and cloud technologies. Phil was previously a Chief Engineer at Next Century Corporation, co-founder and Principal Engineer at eSymmetrix, and co-authored two introductory programming books. He holds a B.S. in Mathematics and Computer Science from Carnegie Mellon University School of Computer Science.