How Machine Learning Can Help Identify Web Defacement Campaigns

By Federico Maggi, Marco Balduzzi, Ryan Flores, and Vincenzo Ciancaglini

Website defacement — the act of visibly altering the pages of a website, notably in the aftermath of a political event to advance the political agenda of a threat actor— has been explored in our various research works. We broke down top defacement campaigns in a previous paper and, in another post, emphasized how machine learning in our security research tool can help Computer Emergency Readiness Teams (CERTs)/Computer Security Incident Response Teams (CSIRTs) and web administrators prepare for such attacks. The latter took off from the analysis done in our most recent paper, Web Defacement Campaigns Uncovered: Gaining Insights From Deface Pages Using DefPloreX-NG. Here we expound on why machine learning (ML) was an ideal method for our analysis to better understand how web defacers operate and organize themselves.

Facilitating machine learning techniques with DefPloreX-NG

In 2017, we presented DefPloreX, a machine learning toolkit that can be used for large-scale e-crime forensics. This year, we presented DefPloreX-NG, a version of the toolkit with enhanced machine learning algorithms and new visualization templates. In our recent web defacement paper, we used DefPloreX-NG to go beyond anecdotal evidence and analyze 13 million deface records spanning 19 years. It can also be used by security analysts and researchers to identify ongoing and live web defacement campaigns, including new and so far unknown ones. The improved and expanded toolkit allows for efficient filtering of actionable intelligence from raw deface sites. It can automatically identify and track defacement campaigns and assigns meaningful textual labels for each campaign’s attributes. In addition, it makes it easier to sort and search sites, for example, according to threat actor or group responsible, motivation, type of content or propaganda, top-level domain (TLD), category of defaced site (media outlet, etc.), and others. The processes involved are heavily aided by machine learning.

Figure 1. Diagram of the automated analysis of deface pages using DefPloreX-NG

Historical and live dataset: For the research we used a dataset based on a unique collection of defacement records from five major reporting sites. By using these reporting sites we were able to provide feeds of defacement records, aggregated from various sources such as sharing initiatives, CERTs, or victim organizations, among others. As a first step, we had to ensure that the datasets we used were trustworthy,^[1] meaning we had to prioritize data on actual site content over metadata. Relying on actual content (such as website, images, text of defacement message, etc.) was key to drawing any meaningful conclusions.
Assigning attributes to the featureless dataset: We assigned attributes to the dataset (timestamp of the defacement event, category of the deface site, etc.). From the raw content (e.g., rendered HTML, images or other media files), we extracted a set of characteristics that we could translate into useful features. By assigning these features we were able to capture visual characteristics such as images or dominant colors, geographical ones drawing from languages used, or related to domains which, for example, indicate the ratio of links pointing to cross-origin domains, among others.
Clustering based on assigned features: Next, we grouped similar deface pages into clusters based on the features we extracted (see Figure 2). For this purpose we used ML-based data clustering. Similar pages will have similar features and thus will end up being clustered together. Clustering allowed us to organize web incidents into campaigns. Since our dataset contained millions of records, each represented by tens of features, we chose an algorithm that addresses any constraints in available memory and time.

Note: Feature engineering is central to any clustering problem. We identified features that could be extracted to represent a deface page.

Figure 2. Sample deface page and features extracted for clustering

Clustering versus classification: We used unsupervised machine learning, i.e., data clustering. The lack of ground truth is the reason why we opted for data clustering as the core of our analysis system, with each deface page serving as an object represented as a tuple of numerical and categorical features.
Labels: After clustering, we labeled the clusters and visualized the campaigns in terms of various dimensions (e.g., length of time, actors, targets, topics, etc.). To provide analysts with an explainable and human-readable view of the clustered deface pages, we represented each cluster as a concise report that includes the time span (oldest and newest deface page) and a list of patterns that create a meaningful label of that cluster. These representations, enabled by the toolkit, allowed us not only to recognize defacement in a monitored benign page but also to tell various defacement campaigns apart.

Sample findings based on common attributes

As mentioned DefPloreX-NG aids analysts to draw details from the analysis of web defacement records, including characteristics of threat actor groups — from what TLDs they target to how they are organized and how they operate. Some of the specific findings we could draw from our analysis included the following:

Topics of messages by defacers evolved over time: To see how the messages left by defacers evolved over time, we used an off-the-shelf ML technique called topic modeling, which is widely used in new classification to determine the subject of a story. The topic modeling algorithm can sort a large amount of data (e.g., deface pages) into a small set of high-level concepts or topics. This showed us an evolution of topics defacers cared about, as reflected in major terms mentioned in deface messages that also tied back to real-world events during the time of defacement. For example, in some years, “pope,” “terror,” “country,” “marocain,” and “turk” were among the top terms in deface pages, coinciding with events such as the papal conclave in 2005 or the Turkish general election in 2007. An understanding of the most common topics also allowed us to make some inferences as to the motivations and affiliations of the various threat actors. As revealed by “marocain” and “terror” keywords, many defacers seem to model themselves as online activists that support religious or sociopolitical ideologies.
Similar targets and cooperation: Campaigns that had similar targets often also overlapped in political agenda and/or motivation, revealing how the actors behind them were likely collaborating.

Note: The analysis from DefPloreX-NG shows there are nine campaigns, with each campaign having participants that are either teams or defacers.

Figure 3. Overview of campaigns related to the Charlie Hebdo attack

Threat actor overlap: One interesting insight from our analysis was that threat actors behind attacks could be acting as lone wolves, but they also often join forces or cooperate on an ad-hoc basis. Cooperation with other groups can be identified, for example, when two defacement pages use many similar characteristics (such as font size, background color, similar color scheme). Such similarities are a strong attribution indicator, and in turn allow analysts to group defacements together and understand the relationships between groups and actors. We rely on this indicator for our automated approach in detecting and tracking campaigns.
Groups vs. single actors: After manually inspecting thousands of deface pages, we found that modern defacers were not simply lone “script kiddies” but tended have team affiliations. Nearly half of the attackers (47 percent) behind defacement campaigns were affiliated with at least one group; the rest operated solo. Very often, names of the teams as well as their members appeared in the content of deface pages. Most of the campaigns (70 percent) were conducted as a joint operation and not the work of lone wolf attackers.
Duration vs. intensity: DefPloreX-NG can automatically label a campaign as long-term or aggressive based on its behavior over time. We found a contrast in how long-term and aggressive campaigns are conducted (see Figure 4). Each cell represents the number of attacks conducted by a campaign per year. Long-term campaigns conduct slower and longer attacks while aggressive campaigns react to geographical events (such as terrorist attacks) and prefer massive attacks conducted a few days after the event.

Figure 4. Long-term campaigns (top) and most intense and aggressive campaigns (bottom)

Defacers leave traces behind and we have shown examples of how we used these traces and machine learning to automate the analysis of millions of cases by grouping individual defacement incidents into categories of similar activity, type, and threat actor responsible. We took a data-driven approach and employed machine learning capabilities to turn unstructured data into meaningful high-level descriptions. Without an automated system, going through 13 million records would have been extremely time- and resource-intensive given the large amount of processing power such a task would have required. We used machine learning in a security tool beyond detection, also building intelligence based on a host of details that can be used for research and other analyses.

To read more about DefPloreX-NG, how it can be used to analyze defacement campaigns, and how ML techniques aided our analysis, see our recent research paper.

^[1] The information volunteered by the actors behind defacements are not always reliable for risk of misleading information purposefully planted in the information supplied.

The post How Machine Learning Can Help Identify Web Defacement Campaigns appeared first on .