Improving Hadoop Security with Host Intrusion Detection (Part 1)
Over the years, the Hadoop development community has steadily added facilities to Hadoop and HBase that improve operational security. These features include Kerberos user authentication, encrypted data transfer between nodes in a cluster, and HDFS file encryption. Trend Micro has contributed several security features that were incorporated into the public Hadoop ecosystem(see our previous post Securing Big Data and Hadoop for details).
Although these security facilities are important, they are primarily focused on protecting Hadoop data. They do not give IT staff visibility into security events inside their Hadoop clusters. That’s where a good host intrusion detection system comes into the picture. We have been working on enhancing big data security by applying OSSEC, our open source host intrusion detection system (HIDS), to add security monitoring to Hadoop and HBase systems. In this post, we’ll go over the capabilities of OSSEC.
OSSEC provides several important security capabilities including file integrity checking, system log analysis, and alert generation. OSSEC has an agent/server architecture. Agents handle monitoring logs, files and (on Window systems) registries then sending back relevant logs in encrypted form to the server over UDP. Intrusions on agent systems are usually detectable though file changes or logged security events.
Figure 1. Securing Hadoop with OSSEC
(Image originally from http://vichargrave.com/securing-hadoop-with-ossec/)
On the server, the logs are parsed with decoders and interpreted with rules that generate security alerts. OSSEC comes out of the box with a large number of decoders and rules that support a wide range of systems and events. OSSEC’s coverage can also be expanded by custom log decoders and security alert rules.
Hadoop File Integrity Checking
Hadoop and HBase systems rely on numerous configuration files and Java files to work properly. Unauthorized changes to any of these files can adversely affect a cluster. This is particularly true of the HDFS namenodes in a Hadoop system and HMaster nodes in a HBase system. The former controls HDFS operations, while the latter is involved with I/O between HMaster and region servers.
OSSEC can detect changes to these important Hadoop files. When an OSSEC agent is started, it recursively scans user specified directories and calculating MD5 and SHA1 hash values for each file it encounters. The file names and hashes are stored in a database on the OSSEC server. The agent repeats this operation at user specified intervals (usually every few hours). When the server receives a hash value for a given file that is different than the hashes were previously stored, the server will generate a security alert. The OSSEC server records each security alert in its own alerts log file.
Normally, the configuration files for Hadoop and HBase systems are located in the /etc directory while the Java files are located in /usr/bin and /usr/sbin. Out of the box, OSSEC is designed to do file integrity checking on all files in these directories. However, if these files are stored in other directories it is a simple matter to check these directories as well by modifying the agent configuration file, ossec.conf.
In the second part of this entry, we will discuss how these tools can be used to quickly detect and graphically show potential intrusion into Hadoop/HBase systems.