False Positives: Why Vendors Should Lower Their Rates and How We Achieved the Best Results
In pursuit of a high cyberthreat detection rate, the some developers of cybersecurity solutions neglect the subject matter of false positives, and unfairly so. Indeed, this is a very inconvenient matter that some developers tend to overlook (or try to solve with questionable methods) until there is a serious incident that could paralyze the work of their customers. Unfortunately, such incidents do happen. Regretfully, only then does the idea dawn on these developers that high-quality protection from cyberthreats involves not only prevention but also a low false-positive rate.
While the minimizing the false positive rate may seem simple enough, it has, as a matter of fact, a multitude of intricacies and snags that demand significant investments, technological maturity, and resources from cybersecurity developers.
The two main reasons of false positives are:
- software, hardware, and human errors that all stem from the developer of the product, and
- the diversity of legitimate (“clean”) software that is being inspected.
The latter reason needs to be clarified.
It’s no secret that programs are written globally by millions of people with a plethora of varied qualifications (from students to experts), using various platforms and adhering to different standards. Every author has his own unique style, which sometimes leads to a situation where the program code resembles a malicious code. This triggers protection technologies, especially those that are based upon low-level binary analysis using different approaches including machine learning.
Without taking into account this peculiarity, and without implementing special technologies to minimize the occurrence of false positives, cybersecurity developers risk ignoring the “first, do not harm” principle. This, in its turn, leads to disastrous consequences (especially for large corporate customers), which can be compared to damage caused by cyberattacks.
For more than twenty years, Kaspersky Lab has been working on processes for development and testing as well as on creating technologies that minimize the rate of false positives. We take pride in having one of the best results in the industry (see tests performed by AV-Comparatives, AV-Test.org or SE Labs) for false alarms, and we are glad to further expand on several specifics of our inner workings. I am sure that this information will allow users and corporate customers to have a more reasonable approach in selecting a cybersecurity solution. Additionally, cybersecurity developers will be able to improve and refine their processes to match the level of the world’s best practices.
We use a triple-tier quality-control system to minimize the rate of false positives, including:
- quality control at the design stage,
- quality control upon the release of a detection method, and
- quality control of released detection methods.
This system is being continuously improved with the help of various preventive measures.
Let us review each tier of the system in greater detail.
Quality control at the design stage
One of our fundamental principles in software development is that each technology, product, or process must contain mechanisms for minimizing the risk of false positives and consequential faults that result from them. Mistakes at the design stage turn out to be the most costly, as correcting them comprehensively may require rewriting an entire algorithm. This is why, with our years of experience, we have produced our own best practices that have allowed to decrease the rate of false positives.
For example, when developing or improving machine learning-based cyberthreat detection technology, we make sure that the technology has been learning from considerable collections of clean files with different formats. Our knowledge base for clean files (a whitelist) assists us with that. The contents of the whitelist have already exceeded 2 billion objects and are constantly collaboratively updated.
During our work, we also make sure that training and test collections of each technology are regularly updated with the most recent versions of clean files. Additionally, our products contain built-in features that minimize false positives for critical system files. Aside from that, at each detection, the product utilizes the Kaspersky Security Network (KSN) to consult the whitelist database and the certificate-reputation service to confirm that the detected file is not a clean one.
However, technologies and products aside, there is also a human factor.
A cybersecurity analyst, a developer of an expert system, or a data analyst might make mistakes at any stage. So, there is room for miscellaneous blocking checks by additional automated systems.
Quality control at the release of a detection method
Before the delivery to users, new methods of cyberthreat detection pass several more test stages.
The greatest protective barrier is the infrastructure system for false positive testing, which works with two collections.
The first collection, which is a critical set, comprises files that are taken from popular operating systems (released for different platforms with different localizations), updates of those systems, office applications, drivers, and our own products. This set of files is routinely supplemented.
The second collection contains a dynamically formed set of files, which includes the most popular files. The size of this collection is chosen by finding a balance between the volume of scanned files (as a consequence, the number of servers), the run time of this scan (hence, the time of delivery of detection methods to users), and the number of potentially affected computers in case of a false positive.
For the time being, the number of files in both collections surpasses 120 million (this is approximately 50 TB of data). Considering the fact that these files are scanned every hour after each release of the database updates, we may infer that the infrastructure checks over 1.2 PB of data for false positives each day.
More than 10 years ago, we were among the first ones in the field of cybersecurity to implement non-signature-based methods of detection that leveraged behavioral analysis, machine learning, and other promising generic technologies. These methods have proven their effectiveness, especially in overcoming sophisticated cyberthreats. However, they require particularly thorough testing for false positives.
For example, behavioral detection allows for the neutralization of a malicious application that has manifested some traits of a malicious behavior during its operation. In order to prevent a false positive for the behavior of clean files, we have created a “farm” of computers, which bring about various user scenarios.
The “farm” offers different combinations of operating systems and popular software. Before releasing each new non-signature-based detection method, we dynamically check it at this “farm” with standard and unique scenarios.
Last but not least, cybersecurity developers should also pay attention to test their web scanners for false positives. A website blocked by mistake can also disrupt the work of a customer, which is not acceptable.
To minimize the number of such incidents, we have developed automated systems to download up-to-date content daily from 10,000 of the most popular websites and scan this content to test for false positives. The most accurate results are achieved by using the most popular versions of common browsers and by using proxies in different geo locations to exclude location-dependent content.
Quality control of released detection methods
Detection methods that have been delivered to users are monitored day and night by the automated systems, which monitor the methods for any behavioral anomalies.
The thing is the dynamics of a detection that triggers a false positive often differs from the dynamics of a detection of a genuinely malicious file. Our genuine automated system monitors these anomalies, and if there is something suspicious, then the system will request an analyst to run an additional check for this detection. If suspicions are very strong, then our automated system turns off the detection method through KSN and immediately informs analysts about it. In addition, there are three teams of cybersecurity analysts on duty in Seattle, Beijing, and Moscow who work shifts around the clock to monitor the situation and quickly resolve emerging incidents. This is Humachine Intelligence in action.
In addition to detecting anomalies, the automated systems monitor performance data, errors in module operation, and potential problems based on diagnostic data received from users over KSN. This allows us to detect potential problems at early stages and eliminate them before their effect becomes noticeable for users.
In case the incident has occurred after all and cannot be closed by disabling an individual detection method, then urgent actions are taken to rectify the situation and allow the problem to be solved quickly and effectively. In this case, we may roll back the databases to a stable release that does not require any additional testing. To be honest, we have not resorted to this method in practice, as there has been no occasion for that thus far. In fact, we’ve only ever used it during our training exercises.
Speaking of training exercises…
Prevention is better than a cure
Not everything can be foreseen, and even if every eventuality were provided for, it would be good to know how certain measures would work in practice. Waiting for a real incident to happen isn’t necessary, as there is always the option of modeling.
Periodically, we conduct internal training exercises to confirm the “combat readiness” of our staff and the effectiveness of our methods for preventing false positives.
The training exercises are focused on full-blown imitation of diverse emergency scenarios in order to see if all of the systems and experts act according to plan. Several divisions of technical and service departments are simultaneously involved in the training exercises. These exercises are scheduled for a weekends and are based on a scrupulously thought-out scenario.
After training, we analyze each division for its performance, improve the documentation and implement changes for the involved systems and processes.
Sometimes during the training process, we discover new risks that had previously gone unnoticed. A more systematic discovery of those risks is achieved through brainstorming potential problems in the areas of technologies, processes and products. After all, technologies, processes, and products are constantly being developed, and any change brings about new risks.
Finally, we work systematically on eradicating root causes for all of the incidents, risks, and problems that were uncovered during our training exercises.
It goes without saying that all of the systems that are responsible for quality control are duplicated and are maintained day and night by the team of experts on duty. A fault in any one system will lead to transitioning over to a duplicate system while the fault itself is immediately addressed.
False positives cannot be avoided completely, but it is possible to lower their rate considerably to minimize their aftermath. This does require substantial investments, technological maturity, and resources from developers of cybersecurity solutions. Yet, these efforts provide a smooth experience for our users and corporate clients. These efforts are imperative and are within the scope of duties of each reliable developer.
Reliability is our creed. Instead of relying on one protection technology, we employ a multi-tier security approach. Protection against false positives is arranged in the same way – it is multi-tiered and duplicated multiple times. There is no other way since we are talking about the high-quality protection of our customers’ infrastructure.
At the same time, we succeed in finding and maintaining the optimal balance between the highest level of protection against cyberthreats and the the lowest level of false positives. This is confirmed by the results of independent tests in 2016: AV-Test.org, a German test laboratory, gave Kaspersky Endpoint Security eight awards at the same time, including Best Protection 2016 and Best Usability 2016.
In conclusion, I would like to note that high quality is not a result that ought to be achieved only once. This is a process that requires constant supervision and improvement, especially where the price of a possible mistake means the disruption of a customer’s business processes.