Introducing ‘Known Distributors’
Providing more context about file provenance and distribution
These days many security operations center (SOC) teams are overwhelmed by huge volumes of alerts. Triaging these alerts takes too long, and many are never investigated at all. “Alert fatigue” leads analysts to take alerts less seriously than they should, resulting in missed threats and consummated breaches.
One of VirusTotal’s main use cases is automatic security telemetry enrichment with the aim of performing alert triage. Indeed, VirusTotal is not only one of the largest and richest malware datasets in the world, over the years we have aggregated all sorts of security-relevant data points for files, URLs, domains and IPs, including goodware indicators and provenance details. As a result, many SOCs are using VirusTotal to perform automated false positive discarding, eradicating alert fatigue and making sure that their teams stay focused on relevant alerts enriched with superior context.
To make this use case more straightforward, today we are introducing Known Distributors, a new attribute for file objects that determines which companies/products a given file belongs to. Before having Known Distributors, we had multiple attributes in the file object to determine its origin, including:
Each one of these attributes was ingested from a different data source and had a different data format. For VirusTotal users it was difficult to spot the difference between the three since all had the same purpose. Now everything is unified under the same attribute, making its analysis and ingestion easier.
Please note that the old attributes listed above will be deprecated as of January 1st, 2022.
When to make use of this?
As said, many SOCs use VirusTotal for automatic false positive discarding. Here is when the Known Distributors property comes in handy.
Let’s say you find the following hash 6d17958c6527346036f35c6d9db2f5c8d820cbfbd043588304c7beddf7ea8641 among a list of IoCs collected from a hypothetical incident.
Querying VirusTotal’s API provides the following information about the sample:
–url “https://www.virustotal.com/api/v3/files/6d17958c6527346036f35c6d9db2f5c8d820cbfbd043588304c7beddf7ea8641?attributes=known_distributors” \
–header ‘x-apikey: <your API key>’
Note: To get more information about these fields check out our API documentation
The information provided in the resulting JSON makes it easy to understand that Microsoft claims the file as part of Windows Server 2004. At VirusTotal’s web interface, the same information is presented to the user as follows:
In case of a file being distributed by multiple companies, it is shown as in the image below:
Where does all this data come from?
Known distributors data is ingested from the following data sources:
HashDB – An internal service that extracts files from base OS images and OS updates.
National Software Reference Library (NSRL) – A project supported by the U.S. Department of Homeland Security, federal, state, and local law enforcement, and the National Institute of Standards and Technology (NIST) which is designed to collect software from various sources and incorporate file profiles computed from this software into a Reference Data Set (RDS) of information.
VT Monitor – A VirusTotal service that periodically scans files coming from software publishers to mitigate false positives.
Trusted Source project – A set of partnerships with key software vendors to appropriately tag files that they distribute, including those that are not signed.
We are also actively working on some partnerships with renowned software download portals so as to consolidate author and distribution metadata for software that they list.
Why is this more relevant than ever?
ING’s global CISO recently pointed out that: “AI/ML can potentially solve issues with scalability in human analysis. Though it is working now […], we do see challenges with AI/ML models that trigger on wrong assumptions. This results in many false positives in the security detection process, which need to be investigated by humans.”
Indeed, at VirusTotal we have also seen how the inclusion of AI/ML in detection engines has led to more false positives, and, most importantly to increasing lack of context. Many detections these days do not include any malware family/toolkit label and since they are ML-powered, the analyst is provided with no additional information beyond a red flag, which in some cases might be misleading. By incorporating the Known Distributors details along with VirusTotal’s wealth of contextual information, security teams can overcome the shortcomings of noisy detection mechanisms that have yet to mature.
We hope that these changes will ease the alert triage use case. Please let us know if you have any questions or suggestions to keep making VirusTotal better for everyone.