Pentaho 5.2 Released at PentahoWorld 2014

A lot of exiting things happened at PentahoWorld 2014, you can read a lot about it on Twitter #PWorld2014. It is amazing what momentum we reached and that over 400 attendees have been at this conference. BTW: Did you know that Pentaho celebrated it’s 10th anniversary last week, too?

PentahoWorld 2014
PentahoWorld 2014

For me, as a „PDI/Kettle addicted person“ ūüėČ one of the major topics was also the release of Pentaho 5.2 (BA & PDI). You can read about all the great new features over here: Upgrade Existing Pentaho Systems (the press release is not out, yet by the time of writing this article, but will come soon over here: Pentaho Press Releases).

PDI 5.2
PDI 5.2

Here are the highlight of PDI 5.2:

Pentaho Data Integration 5.2 delivers many exciting and powerful features that help you quickly and securely access, blend, transform, and explore data.

New Streamlined Data Refinery Feature

The Streamlined Data Refinery (SDR)  is a simplified, ad hoc ETL refinery composed of a series of PDI jobs that take raw data, augment and blend it through the request form, and then publish it to the BA Server for report designers to use in Analyzer.

R Script Executor Step Improvements

The R Script Executor, Weka Forecasting, and Weka Scoring steps form the core of the Data Science Pack and transforms PDI into a powerful, predictive analytics tool. The R Script Executor step allows you to incorporate R scripts in your transformation so that you can include R-based statistical programming in your data flow. In PDI Version 5.2 you can now „plug and play“ R scripts, without extra customization.¬† Now you can pass incoming field metadata to the output field metadata, use a more intuitive user interface to run scripts by rows or by batches, and test scripts.

New DI Server Administration Features

Porting content from one environment to another and performing general DI Repository maintenance is easier with the introduction of the new Purge Utility.  The Purge Utility permanently purges the repository of versions of shared objects, such as database connection information, jobs, and transformations.  You can also turn DI Repository versioning and comment capturing capabilities on and off.

Kerberos Security Support for CDH 5.1 and HDP 2.1

If you are already using Kerberos to authenticate access to a Cloudera Distributed Hadoop 5.1 or Hortonworks Data Platform 2.1 cluster, with a little extra configuration, you can also use Kerberos to authenticate Pentaho DI users who need to access those clusters.

New Marketplace Plugins

Pentaho Marketplace continues to grow with many more of your contributions. Pentaho Marketplace is a home for community-developed plugins and a place where you can contribute, learn, benefit from, and connect to others. New contributions include:

  • LookupTimeDimensionStep: Looks up and creates an entry on a data warehouse dimension time table and returns the ID.
  • Probabilistic Row Distributions: Contains a collection of Row Distribution plugins for PDI that use probabilistic methods for determining the distribution of rows.
  • PDI Groovy Console: Adds a Groovy console to the Help menu that has helper methods and classes that interact with the PDI environment.
  • Gremlin Script Step: Provides a Gremlin script step for graph pipeline processing.
  • Avro Output Plugin: Allows to output Avro files. Avro files are commonly used in Hadoop allowing for schema evolution and truly separating the write schema from the read schema.

Improved Upgrade Experience

Upgrading PDI is easier because it is no longer a manual process.  You can now upgrade from 5.1.x to 5.2 using the same upgrade utility used for patch releases.

More Details and Download

Dieser Eintrag wurde ver√∂ffentlicht in Kettle (PDI). F√ľgen Sie den permalink zu Ihren Favoriten hinzu.