This has been presented at the Pentaho Community Meeting 2014 (#PCM14):
There are many reasons to collect usage statistics, for example:
- It can help in improving the product in the main used areas and features (steps, job entries, database types etc.)
- It can help the user to determine if some features are effected by a planned upgrade (the upgrade notes on each release cover affected steps, job entries etc.)
- When it gets combined with usage statistics in development/test/production you can also determine if some jobs/transformation are never used
Here is one solution with an how to:
Analyze the used steps, job entries and database types
- Download the solution analyze_trans_job
- Within PDI/Kettle, please open the job _analyze_trans_job/transformations_jobs/0_analyze_trans_job.kjb
- Look at the comment within the job, it gives you all the usage information.
- It is also possible to anonymize file names, transformation and step names: please see the option anonymize_names within the parameters.txt file.
If you want to contribute to this solution, the jobs/transformations are hosted on GitHub.
Note: This is limited actually to the file system and does not support a repository or repository exported file.
Further information can be found on the Pentaho Community Wiki for the user statistics that can be achieved by using the Pentaho Operations Mart.