Securing Big Data and Hadoop
The whole idea of Big Data brings with it its own special tools and frameworks that are needed to manage the truly enormous mountains of data that are generated, analyzed, and correlated.
One of the frameworks that has found success in Big Data is Hadoop, which is managed by the Apache Foundation. Hadoop is used by a wide variety of organizations to manage and process large quantities of data across computer clusters using simple programming models.
Trend Micro also uses Hadoop in its own environments, and we saw opportunities to help improve the security model of Hadoop. We’ve worked with other Hadoop developers to improve three key areas of Hadoop:
Developing a Coprocessor API for HBase
HBase is a scalable, distributed database built on top of Hadoop and the Hadoop Distributed File System (HDFS). We worked with other developers to introduce a coprocessor API to HBase. Adding this feature to HBase allows developers to include new features and functionality in their HBase platforms.
This allows for Hadoop users to customize their installations to add new features that are not part of the original HBase feature set. While not directly feature-related, this was essential for the second area where we contributed to Hadoop.
Using the Coprocessor For Access Control
With the ability to now add new features, Trend Micro worked to add access control to HBase using the new coprocessor API. This allowed database administrators to set more precise permissions for users.
This may not sound like a significant addition, but it is. This makes multi-tenant usage of a Hadoop/HBase cluster much more secure, as each user is assured that their data is secure and not accessible to other parties.
Another component of Hadoop is ZooKeeper, which is used to coordinate clusters which may be distributed across different networks and locations. Managing and implementing highly distributed applications is always a complicated task, which is where ZooKeeper comes in. It allows for easier coordination and management by HBase and other distributed components of Hadoop.
Because so many other vital components rely on ZooKeeper, we improved the overall security of Hadoop by adding authentication to ZooKeeper as well. A security breach here would have significant consequences.
What It All Means – Enabling Secure Multi-user, Multi-tentant Usage
All of these additions to Hadoop were made with one thing in mind: to make multi-user, multi-tenant deployments of Hadoop more secure. Scalability is not just about building solutions that can accommodate large amounts of data; it is also ensuring that that as the amount of data (and users) increase, that this is stored in a safe and secure manner.
These technologies are part of the standard Hadoop distribution today, and are used by organizations and developers to enhance the security of their Hadoop clusters. Not only does our work improve the security of Hadoop as deployed internally today, it also allows for multi-tenancy, which is Hadoop easier to offer as a service and drive down costs and encourage adoption.
We’re trying to make the Security Intelligence Blog better. Please take this survey to tell us how.