How Machine Learning Techniques Helped Us Find Massive Certificate Abuse by BrowseFox
By employing machine learning algorithms, we were able to discover an enormous certificate signing abuse by BrowseFox, a potentially unwanted application (PUA) detected by Trend Micro as PUA_BROWSEFOX.SMC. BrowseFox is a marketing adware plugin that illicitly injects pop-up ads and discount deals. While it uses a legitimate software process, the adware plugin may be exploited by threat actors by corrupting ads to lead victims to malicious sites and unknowingly download malware. In our analysis, we determined that BrowseFox accounts for a large number of our dataset of 2 million signed files — files that have been verified for their validity and integrity.
Discovering the BrowseFox certificate abuse
We identified the certificate signing abuse while preparing for our BlackHat Asia Conference demonstration in 2017, where we showed how locality sensitive hashing (LSH) could be used for smart/dynamic whitelisting (cryptographic hashes such as SHA1 or MD5 are completely unsuited to this task).
While analyzing a set of 2 million signed files using Trend Micro Locality Sensitive Hashing (TLSH)-based clustering, we determined that many of the clusters had a very particular and strange feature: The clustered files were signed by many different signers. This occurs with many legitimate pieces of software, but the clusters we associate with BrowseFox had another property; when we constructed a graph of the clusters against the signers, the clusters formed an approximate bipartite clique. We were then able to identify the files associated with the clique and subsequently labelled approximately a quarter of a million of the files as BrowseFox candidates. We checked the candidate samples on VirusTotal, found that 5,203 of the files’ hashes were on the site, and determined that they are indeed all BrowseFox files. Upon further investigation, we found that these files have been signed by 519 different certificate signers. This appears to be BrowseFox’s tactic — creating new signing entities to obtain valid certificates.
As previously observed in Exploring the Long Tail of (Malicious) Software Downloads, signed files are not necessarily non-malicious in nature. In fact, in the study, it was established that many malicious software downloads are signed. Our BrowseFox findings further highlight how malicious actors abuse valid certificate signers in distributing malware.
Out of the 2 million signed executable files in our dataset, we discovered that a massive 244,000 are malware or PUA files by BrowseFox. These 244,000 files were successfully identified as BrowseFox via two strict conditions: first, the file has been signed by one of the 519 bad signers, and second, it falls under any of the BrowseFox groups.
Figure 1. A screen capture of the signature information of a BrowseFox file as seen on VirusTotal
Figure 1 shows a sample BrowseFox file gathered from VirusTotal. In the signature information, the root certificate holder is VeriSign, which provides code signing services. The entity or company which created the file, i.e., the final signer, is Sale Planet. It is evident that this BrowseFox PUA has been countersigned by legitimate signers as well.
The BrowseFox evolutionary tree
We have created a BrowseFox evolutionary tree for the 244,000 files to further show the scale of the certificate signing abuse.
Our evolutionary tree takes inspiration from the phylogenetic trees in the field of biology and gives a description and a visual representation of how connected one file is to another. The height in each evolutionary tree is its TLSH distance, wherein the height value represents the relative distance between the center TLSH hash and a group of files’ hashes that are similar to it. An example of an evolutionary tree for a software we clustered using TLSH is shown in Figure 2.
Figure 2. Dropbox evolutionary tree
Figure 3. BrowseFox evolutionary tree
The BrowseFox evolutionary tree (shown in Figure 3) will be able to help detect if a file is possibly BrowseFox or otherwise if it falls within any of the groups or clusters featured in the graph. If a file doesn’t fall within the radius of any of the groups, its relative distance to any of the groups may determine if it is BrowseFox or not. To check a file in question, its metadata may be further analyzed in the sandbox or it may be assessed if it satisfies additional criteria such as having a particular bad signer, among others.
The evolutionary tree can be used on anything that TLSH applies to, such as files that were created using standard software development approaches. Files that fit in this paradigm and may be visualized via evolutionary trees include PUAs, coinminers, hacktools, or many advanced persistent threats.
Under the hood: ML, TLSH technology
Two machine learning styles were integral to our discovery: a combination of both unsupervised and instance-based machine learning types. Unsupervised learning techniques enabled us to go through a great number of unlabeled files, find unique patterns from them, and cluster them accordingly. Instance-based learning algorithms allow us to identify which among the unlabeled files are malicious or otherwise by computing distance scores and comparing them to known good and bad files. Instance-based learning is useful for applications such as identifying potentially unwanted programs (PUPs).
These ML types power TLSH, which we used to cluster the 2 million signed executable files from which the BrowseFox code signing abuse was detected. TLSH is an instance-based machine learning scheme that Trend Micro developed and open-sourced on GitHub, and it is used for generating hash values for whitelisting purposes. Locality sensitive hashing (LSH) techniques such as TLSH veer away from both traditional and cryptographic hashing in terms of ability to increase the possibility of bundling similar files together.
TLSH’s scalability and speed, relatively higher security against attacks, and better accuracy compared to other fuzzy hashing techniques such as Ssdeep make it the most suitable approach for this purpose. In a paper titled Beyond Precision and Recall: Understanding Uses (and Misuses) of Similarity Hashes in Binary Analysis, TLSH was stated to have consistently outperformed Ssdeep and is “very reliable in recognizing variants of the same software when the code changes.”
Our discovery of the BrowseFox certificate abuse underscores the current flaws in application control systems and the need for a better whitelisting process. TLSH allows the proactive detection of good files’ hashes and can keep up with a software’s regular patching and version releases. Aside from its ability to reduce false alarms in systems, it also supports different environments, such as Linux, Windows, and Python Extension. A better and smarter whitelisting process is what TLSH is capable of as part of whitelisting systems.
 The 519 different certificate signers were detected during our October 2017 analysis.
The list of the 519 different certificate signers is in this appendix.
The post How Machine Learning Techniques Helped Us Find Massive Certificate Abuse by BrowseFox appeared first on .