Doing threat analysis big data while preserving user privacy.
Anti-Virus industry has changed a lot during that past 4-7 years, we like other companies, used to be very file signature and file scanning oriented back in 2008 or so. And as that obviously did not scale, we moved into detecting more useful patterns of attacks, and started focusing on preventing infections in the first place rather than trying to detect files dropped by already successful attacks.
Doing things the smart way requires good visibility, so we had to start doing big data. Or should we say big information, since any fool can collect massive amount of data, the trick is in converting big data into small and understandable information. However in order to do data mining we have to collect data from our users, which can be a problematic from privacy point of view. Which has lead into people asking just what kinds of data we are collecting and how we are processing it.
So in order to answer these questions we wrote a white paper detailing information collected by our Internet Security range of products. The whitepaper is readable here.
The basic principle of our data collecting is to collect only what we need and anonymize it as early as possible. All data that can be sanitized in the client is already stripped there so that we get system data, but will not taint our database with anything that looks like users personal data. The data that cannot be sanitized at the client, such as client IP address, will be stripped out in the first server that processes the data. We also run regular cleanups in our customer data to remove any user related data that would have escaped multiple layers of import sanitization.
We do continue out work on knowing everything that is needed to stop attacks, but still making sure that we do not end up knowing anything about our users.
On 30/04/14 At 07:33 AM