PreReq Hadoop System Hadoop (Tested with 0.19.1) Hadoop (Tested with 0.20.x) HBase (Tested with 0.20.x) Cacti System Cacti (Tested with 0.8.7d) JRE (Pref JDK 1.6+) Installation Hadoop-cacti-jtg gathers data by accessing JMX information. JMX support is fairly easy to enable with hadoop. Hadoop conf/hadoop-metrics.properties Setup all metrics objects to use NullContextWithUpdateThread. The period must match the period of your polling. For example my monitoring station polls at 5 minute intervals. Thus I have set period to 300(seconds). dfs.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread dfs.period=300 jvm.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread jvm.period=300 HBase HBase provides detailed instructions for JMX: http://hadoop.apache.org/hbase/docs/current/metrics.html conf/hadoop-metrics.properties hbase.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread hbase.period=300 jvm.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread jvm.period=300 rpc.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread rpc.period=300 Enable JMX on Each Hadoop Node cd /conf vi jmxremote.access monitorRole readonly controlRole readwrite vi jmxremote.password monitorRole controlRole chmod 600 jmxremote.password chown $HADOOPUSER jxmremote.password vi hadoop.env ... export HADOOP_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false" export HADOOP_JMX_BASE="$HADOOP_JMX_BASE -Dcom.sun.management.jmxremote.password.file=/opt/hadoop-0.19.1/conf/jmxremote.password" export HADOOP_JMX_BASE="$HADOOP_JMX_BASE -Dcom.sun.management.jmxremote.access.file=/opt/hadoop-0.19.1/conf/jmxremote.access" # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="$HADOOP_JMX_BASE -Dcom.sun.management.jmxremote.port=10001" export HADOOP_SECONDARYNAMENODE_OPTS="$HADOOP_JMX_BASE -Dcom.sun.management.jmxremote.port=10002" export HADOOP_DATANODE_OPTS="$HADOOP_JMX_BASE -Dcom.sun.management.jmxremote.port=10003" export HADOOP_BALANCER_OPTS="$HADOOP_JMX_BASE -Dcom.sun.management.jmxremote.port=10004" export HADOOP_JOBTRACKER_OPTS="$HADOOP_JMX_BASE -Dcom.sun.management.jmxremote.port=10005" ... Restart Hadoop. If you have done this correctly you should notice your system is now listening on ports 10001-10005. At this point you should be able to use jconsole (found in your JDK bin) to connect to these ports to test. HBase The process above is similar for HBase. Each HBase component has its own variable like HBASE_REGIONSERVER_OPS inside hbase-env.sh. Change them and restart. Again look here: http://hadoop.apache.org/hbase/docs/current/metrics.html $HBASE_HOME/conf/hbase-env.sh Add the lines: JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false" JMX_OPTS="$JMX_OPTS -Dcom.sun.management.jmxremote.password.file=$HBASE_HOME/conf/jmxremote.passwd" JMX_OPTS="$JMX_OPTS -Dcom.sun.management.jmxremote.access.file=$HBASE_HOME/conf/jmxremote.access" export HBASE_MASTER_OPTS="$JMX_OPTS -Dcom.sun.management.jmxremote.port=10101" export HBASE_REGIONSERVER_OPTS="$JMX_OPTS -Dcom.sun.management.jmxremote.port=10102" $HBASE_HOME/bin/hbase Towards the end of the script, replace the lines: # figure out which class to run if [ "$COMMAND" = "shell" ] ; then CLASS="org.jruby.Main ${HBASE_HOME}/bin/hirb.rb" elif [ "$COMMAND" = "master" ] ; then CLASS='org.apache.hadoop.hbase.master.HMaster' elif [ "$COMMAND" = "regionserver" ] ; then CLASS='org.apache.hadoop.hbase.regionserver.HRegionServer' with the lines: (adding the "HBASE_OPTS=..." lines for "master" and "regionserver" commands) # figure out which class to run if [ "$COMMAND" = "shell" ] ; then CLASS="org.jruby.Main ${HBASE_HOME}/bin/hirb.rb" elif [ "$COMMAND" = "master" ] ; then CLASS='org.apache.hadoop.hbase.master.HMaster' HBASE_OPTS="$HBASE_OPTS $HBASE_MASTER_OPTS" elif [ "$COMMAND" = "regionserver" ] ; then CLASS='org.apache.hadoop.hbase.regionserver.HRegionServer' HBASE_OPTS="$HBASE_OPTS $HBASE_REGIONSERVER_OPTS" Cacti Cacti requires shell scripts (kickoff/wrappers), xml templates with the graph settings, and the compiled hadoop-cacti-jtg.jar which has the Java "Fetch code". Checkout the entire project and build it. This requires JAVAC, SVN and ANT. cd /tmp svn co http://www.jointhegrid.com/svn/hadoop-cacti-jtg/trunk/ hadoop-cacti-jtg cd hadoop-cacti-jtg ant jar cp dist/hadoop-cacti-jtg.jar /scripts/ cp -R scripts/* /scripts/ Testing cd /scripts/0_20 sh FSNamesystemStatus.sh \ service:jmx:rmi:///jndi/rmi://10.0.1.2:10001/jmxrmi \ controlRole \ hadoop:service=NameNode,name=FSNamesystemState Expected Result: CapacityTotal:2113748992 CapacityUsed:40960 CapacityRemaining:812830720 BlocksTotal:1 FilesTotal:9 numDeadDataNodes:0 numLiveDataNodes:1 JobClient Templates The JobClient Templates do NOT currently use JMX. They require a minimal hadoop instalation for operation. Download a Hadoop version matching your version cd /scripts/ wget http://mirror.candidhosting.com/pub/apache/hadoop/core/hadoop-0.19.2/hadoop-0.19.2.tar.gz tar -xzf hadoop-0.19.2.tar.gz ln -s hadoop-0.19.2 hadoop-0.19 ln -s hadoop-0.19/hadoop-0.19.2-core.jar hadoop-0.19/hadoop-0.19-core.jar We do the symlinking so that JobClient.sh is simple. We don't like complicated shell scripts. JobClient Testing Copy your hadoop-site.xml or your mapred-site.xml to /scripts/. (The only variable that really needs to be set is the mapred.job.tracker but taking the entire file is suggested) cd /scripts/0_19/jobclient sh JobClient.sh hadoop-site.xml mapTasks:1 maxMapTasks:2 reduceTasks:2 maxReduceTasks:2 jobsToCompleteSize:1 Importing the Graph Templates into Cacti Using your cacti web interface, go to the 'Import Template' section: Inside trunk/cacti_templates you will find folders 0_19 0_20...Inside these folders you will find multiple XML files hadoop_namenode.xml, hadoop_datanode.xml. Import each file. Yes! We actually support multiple verions, the entire project is branched internally. So one monitoring server can graph multiple clusters running different major versions of Hadoop! Usage Each graph takes three or four parameters. These would be the same values specified in the testing section above. 1 JMXURL 2 user 3 pass 4 JMXObjectName (not always required) In JMX controlRole is required to call JMX operations. The ONLY graph 'Live vs Dead DataNodes' calls the operations numLiveDataNodes() and numDeadDataNodes(). I always use the controlRole. Using JWalker JWalker is a command line JMX walk tool. You can use it like you would use jconsole to dump the properties of JMX. In particular your need this to determing the name of DataNode objects. sh JWalker.sh service:jmx:rmi:///jndi/rmi://10.0.1.2:10001/jmxrmi \ controlRole Support http://www.jointhegrid.com Edward Capriolo edlinuxguru@gmail.com Contribute Please do! Bugs, questions, comments, praise, condemnation Let me know! !!Screen Shots Accepted!! If you want to show off a graph from Hadoop-Cacti-JTG, you can sanitize hostnames (or not) and send it to me along with a comment. I will be sure to add it to the site!