Do you remember the intelligent robots in the movie A.I. Artificial Intelligence? Well, that same technological concept applies to monitoring digital threats! Here at Axur we apply state-of-the-art machine learning techniques to identify phishing cases. Among the millions of URLs that we collect every day, we quickly spot cases that could affect hundreds of thousands of unsuspecting people. And the process is very interesting. Just look:
Artificial intelligence? Machine learning? Is it all the same thing? No! It’s like this: artificial intelligence, which is actually a generic term, is a very large field. It encompasses machine learning, which is the process by which machines are taught to observe patterns for decision-making. Within machine learning there are two distinct types:
Axur's machine learning operation is quite simple, though it’s not, of course, easy to build. But the process can be explained. It all starts with a database, which passes repeatedly through a huge number of tests and improvements so that actions can then be implemented to rule out URLs. Some datasets take months or more to prepare and can contain millions of pieces of data! But let’s start at the beginning.
First, data science: Our machine learning team builds a database with a number of detected URLs that have already been verified by the Digital Fraud Discovery team. On each line there is either a true for occurrences that are actually phishing or a false for those that are legitimate.
This database is used in the initial algorithm’s first “lesson.” It receives the first part to learn, and the rest to test its learning. The entire process is done with programming languages specific to data science, using a hybrid of on-premise and cloud structures to provide greater computing power.
The results are then verified by expert phishing analysts, who validate the results and point out any abnormalities to the data science team.
Now comes one of the most important parts of the machine learning implementation process: the so-called feature engineering. This consists of identifying the characteristics that allow us to accurately differentiate phishing from legitimate cases. Some examples of features used in URL analysis are:
In all, we analyze over 80 features. Once all of these are available, it's time for testing, testing, and more testing. Using statistical analyses, various numbers and percentages show us which combinations are best suited to obtain the greatest possible number of hits.
A small percentage of all the occurrences identified as phishing are randomly sent for team analysis. This allows us to see if the machine is really getting it right. Currently, the hit rate of the algorithms used to validate phishing is higher than the hit rate achieved by humans. After all, to err is human! Our process can validate a gigantic volume of data in minutes.
Want to learn more about how Axur's entire digital risk monitoring and response process works? Then check out our solution for phishing, which ensures that no fake pages can affect your brand for long. Who knows--maybe machine learning could be your ally!
Mateus Dalponte
PhD in Applied Physics and a member of Axur for 8 years, having started as manager in operations of detection, analysis and fraud removal. Currently responsible for the Data Science and Machine Learning team, acting on automation and economies of scale on detecting digital risks.