Sunday, February 19, 2012

Hadoop Tools Universe

Tools and stuff related to the Hadoop ecosystem

Overview Links
http://www.slideshare.net/joshwills/hadoop-and-machine-learning


Configuration Management
Puppet
http://hstack.org/hstack-automated-deployment-using-puppet/

Chef
http://blog.milford.io/2011/03/first-github-post-hadoop-chef-cookbook/


Coordination Service
ZooKeeper
http://zookeeper.apache.org/
http://www.quora.com/Why-is-Apache-ZooKeeper-used-along-with-Hadoop


Storage
Distributed schema-less storage
HDFS
Ceph

Append only storage and metadata
Avro
RCFile
HCatalog

Mutable key-value storage and metadata
HBase


Integration
Tool access
FUSE
JDBC
ODBC

Data Ingestion
Flume
Sqoop


Data Prep/Feature Engineering
Languages/Environments
PigLatin
HiveQL

Java/Scala APIs
Crunch (Cloudera)
Scoobi (NICTA)
Cascading (Concurrent)
Jaql (IBM)


Machine Learning
Apache Mahout
http://mahout.apache.org/

SystemLM (IBM)

R-based Systems
Sugue
RHIPE
RHadoop
Ricardo (IBM)