Pentaho Kettle for Big Data » Jens Bleuel about Kettle (PDI)

All of Pentaho’s big data capabilities will be available as open source in the new Pentaho Kettle 4.3 release: Big data capabilities include the ability to input, output, manipulate and report on data using the following Hadoop and NoSQL stores: Cassandra, Hadoop HDFS, Hadoop MapReduce, Hadapt, HBase, Hive, HPCC Systems and MongoDB.

With regard to Hadoop, Pentaho Kettle makes available job orchestration steps for Hadoop, Amazon Elastic MapReduce, Pentaho MapReduce, HDFS File Operations, and Pig scripts. All major Hadoop distributions are supported including: Amazon Elastic MapReduce, Apache Hadoop, Cloudera’s Distribution including Apache Hadoop (CDH), Cloudera Enterprise, EMC Greenplum HD, HortonWorks Data Platform powered by Apache Hadoop, and MapR’s M3 Free and M5 Edition. Pentaho Kettle can execute ETL transforms outside the Hadoop cluster or within the nodes of the cluster, taking advantage of Hadoop’s distributed processing and reliability.

Download, how-to docs, videos and more at http://community.pentaho.com/BigData, see also Matt’s blog at http://www.ibridge.be/?p=207