Tools and stuff related to the Hadoop ecosystem
Overview Links
http://www.slideshare.net/joshwills/hadoop-and-machine-learning
Configuration Management
Puppet
http://hstack.org/hstack-automated-deployment-using-puppet/
Chef
http://blog.milford.io/2011/03/first-github-post-hadoop-chef-cookbook/
Coordination Service
ZooKeeper
http://zookeeper.apache.org/
http://www.quora.com/Why-is-Apache-ZooKeeper-used-along-with-Hadoop
Storage
Distributed schema-less storage
HDFS
Ceph
Append only storage and metadata
Avro
RCFile
HCatalog
Mutable key-value storage and metadata
HBase
Integration
Tool access
FUSE
JDBC
ODBC
Data Ingestion
Flume
Sqoop
Data Prep/Feature Engineering
Languages/Environments
PigLatin
HiveQL
Java/Scala APIs
Crunch (Cloudera)
Scoobi (NICTA)
Cascading (Concurrent)
Jaql (IBM)
Machine Learning
Apache Mahout
http://mahout.apache.org/
SystemLM (IBM)
R-based Systems
Sugue
RHIPE
RHadoop
Ricardo (IBM)
Sunday, February 19, 2012
Subscribe to:
Comments (Atom)