A lot of exiting things happened at PentahoWorld 2014, you can read a lot about it on Twitter #PWorld2014. It is amazing what momentum we reached and that over 400 attendees have been at this conference. BTW: Did you know that Pentaho celebrated it’s 10th anniversary last week, too?
For me, as a „PDI/Kettle addicted person“ 😉 one of the major topics was also the release of Pentaho 5.2 (BA & PDI). You can read about all the great new features over here: Upgrade Existing Pentaho Systems (the press release is not out, yet by the time of writing this article, but will come soon over here: Pentaho Press Releases).
Here are the highlight of PDI 5.2:
Pentaho Data Integration 5.2 delivers many exciting and powerful features that help you quickly and securely access, blend, transform, and explore data.
New Streamlined Data Refinery Feature
The Streamlined Data Refinery (SDR) is a simplified, ad hoc ETL refinery composed of a series of PDI jobs that take raw data, augment and blend it through the request form, and then publish it to the BA Server for report designers to use in Analyzer.
R Script Executor Step Improvements
The R Script Executor, Weka Forecasting, and Weka Scoring steps form the core of the Data Science Pack and transforms PDI into a powerful, predictive analytics tool. The R Script Executor step allows you to incorporate R scripts in your transformation so that you can include R-based statistical programming in your data flow. In PDI Version 5.2 you can now „plug and play“ R scripts, without extra customization. Now you can pass incoming field metadata to the output field metadata, use a more intuitive user interface to run scripts by rows or by batches, and test scripts.
New DI Server Administration Features
Porting content from one environment to another and performing general DI Repository maintenance is easier with the introduction of the new Purge Utility. The Purge Utility permanently purges the repository of versions of shared objects, such as database connection information, jobs, and transformations. You can also turn DI Repository versioning and comment capturing capabilities on and off.
Kerberos Security Support for CDH 5.1 and HDP 2.1
If you are already using Kerberos to authenticate access to a Cloudera Distributed Hadoop 5.1 or Hortonworks Data Platform 2.1 cluster, with a little extra configuration, you can also use Kerberos to authenticate Pentaho DI users who need to access those clusters.
New Marketplace Plugins
Pentaho Marketplace continues to grow with many more of your contributions. Pentaho Marketplace is a home for community-developed plugins and a place where you can contribute, learn, benefit from, and connect to others. New contributions include:
- LookupTimeDimensionStep: Looks up and creates an entry on a data warehouse dimension time table and returns the ID.
- Probabilistic Row Distributions: Contains a collection of Row Distribution plugins for PDI that use probabilistic methods for determining the distribution of rows.
- PDI Groovy Console: Adds a Groovy console to the Help menu that has helper methods and classes that interact with the PDI environment.
- Gremlin Script Step: Provides a Gremlin script step for graph pipeline processing.
- Avro Output Plugin: Allows to output Avro files. Avro files are commonly used in Hadoop allowing for schema evolution and truly separating the write schema from the read schema.
Improved Upgrade Experience
Upgrading PDI is easier because it is no longer a manual process. You can now upgrade from 5.1.x to 5.2 using the same upgrade utility used for patch releases.
More Details and Download
- Check out the Pentaho documention New Features in Pentaho Data Integration 5.2 to read about the details.
- Download the Community Edition (PDI 5.2 CE) from Sourceforge.
- Download Enterprise Edition (PDI 5.2 EE): Log into the Pentaho Customer Support Portal or (if you are not a customer, yet) request a Free 30-Day Trial version.