Pentaho Report Designer and Excel Data Sources.

Posted under Pentaho, Report Designer, Tutorials, Visualisation

Last week I had a regular Excel/Access user approach me with a problem:

I have 40 sale campaigns. I need to put together a quick presentation (as a PDF) which will show 2 key metrics (New and Loss revenue) for each of the 40 sale campaigns. Do you know of a quick way I can produce this presentation?

I thought this would the perfect opportunity to show off Pentaho Report Designer (PRD) as an alternative to Mircosoft Excel, Access or other big BI vendors. Below is a step by step guide on how I accomplished this.

Scenario.

The above statement sums up the users requirements, visually the the report needed to look like this:

Report Structure

After I knew the structure of the report I got the user to export the data into an Excel Worksheet – this was going to be my data source – after this it took me only 5 minutes to create the presentation using PRD.

Data Source.

The end user was able to produce a raw extract of the data which the report needed to be based on. Here is a preview of the data:

Campaign Week_Ending New_09 New_10 Loss_09 Loss_10
1 04/04/2010 12313 23423 34534 3463
  • Campaign: Campaign identifier which duplicates for each week row
  • Week: Week ending date which is for 2010
  • New_09, Loss_09: Revenue for 2009
  • New_10, Loss_10: Revenue for 2010

To use an Excel worksheet as a data source make sure the Data tab is active, right click on the Data Sets icon and select the Table option:

Table Data Source (Data > Data Sets > Table)

After clicking the Table option a Table Datasource Editor window is displayed:

Table Datasource Editor Window

Step 1. Create a Query.

The first step is to create a query which will store the data set. Click on the Add a new query icon (Add Query) icon and then specify a name, in this example I’m naming the query dataset.

Table Datasource Editor New Query

Step 2. Select the Excel Worksheet.

The next step is to select the Excel worksheet which contains the data set. Click on the Table Icon icon located in the bottom left hand corner to open up a file browser window, select the Excel file which contains the data set and click OK. Your data set should now be visible (as my Excel worksheet has headers I have checked the Use First Row as Header option).

Table Datasource Editor Preview Data

At the moment you can only use .XLS files so if you are using Office 2007+ make sure you save the extension as .XLS instead of .XLSX and if your .XLS file contains formulas they will not be recognised by PRD so be sure to paste special as values.

Step 3. Manipulate the Data.

You can add empty rows and columns or remove rows and columns by using these icons:

Modify the Dataset

You are also able to change the column headers and data type by double clicking on the column header:

Table Datasource Editor - Modify Column Data Types and Names

In this example I didn’t need to make any modifications to the data however I did have trouble converting the Week_Ending field into a date format so in the Excel worksheet I converted it into a string using the TEXT() function – if anyone knows how to get this working please lleave a comment.

Once you are happy with the data set click the OK button. The data set should now be visible as fields under the Data Sets > Table > dataset icon under the Data tab:

Data Sets Fields

Building the Report.

The next step was to build the report, from the report structure (above) I knew I had to create one group Campaign, and within that group add 2 charts which represented the New and Loss Revenue for 2009 and 2010.

Step 1. Add the Group.




To get started add the Campaign field as a group by right clicking on the Groups item under the Structure tab and selecting the Edit Group .. option:

Edit Group Option

After clicking the Edit Group .. option an Add Group window will appear, specify a name for the group i.e. Campaign and using the arrows move the field over which defines the group which in this example is Campaign and click OK when done.

Add Group Window

The Campaign field will now be defined under the Attributes tab for the Group item:

Campaign Group Attributes

The next step is to make the Group Header and Details Header are visible on the report canvass. Click on the Groups > Group Header item under the Structure tab and and under the Attributes tab uncheck the hide-on-canvass option:

Group Header Visible

Repeat the above steps for the Groups > Details Body > Details Header item as well:

Details Header Visible

Step 2. Functions.

I will need to create three functions; one which will convert the Week_Ending field into a date and the other two will be display dynamic chart titles.

Converting the Date.

As I mentioned above I had some issues with letting PRD convert the Week_Ending field into a date from the Table Datasource Editor, so to get around this I created a formula which would take the Week_Ending field (as a string) and convert it into a Date.

To do this create a new function by right clicking on the Functions option under the Data tab and click the Add Function… option.

Add Function Menu Option

From the Add Function… window expand the Common folder and select the Open Formula option and click the Add button:

Add Function Open Formula

This will add an Open Formula function under the Functions option (seen under the Data tab) :

New Open Formula Created

To change the name of the function (named by default as FormulaExpression1) click on the function and in the area below change the value that is set for the Function Name, in this example I named the function New_Week_Ending:

New Function Name

To edit the formula of the function click on the Dot Dot Dot icon which is displayed when you click in the Formula value area, this will display a Formula Editor window:

Formula Editor Window

As my date is in the current format YYYY-MM-DD I needed to use the DATE function with a combination of LEFT, MID and RIGHT functions to convert it into a date. I won’t go into too much detail with this section as all these functions are pretty self explanatory and are well documented in the Formula Editor but the final formula looked like this:

=DATE(LEFT([Week_Ending];4);MID([Week_Ending];6;2);RIGHT([Week_Ending];2))

Date Formula inside Formula Editor

Click the OK button to save the New_Week_Ending formula. I have just created a function which will convert the Week_Ending (string) field into a New_Week_Ending (date) field so that the X-Y Chart TimeSeries Collector will now recognise it as a date.

Dynamic Chart Titles.

Each chart needs to have a dynamic title which will output the campaign and the measure which is represented i.e. 1 – New Revenue ($). As I have two charts for each campaign I will need to create two functions: one for the New Revenue chart and one for the Loss Revenue chart.

Following the steps outlined above to create two new functions (with the names sNewChart and sLossChart) using the Open Formula type and specify the formulas to be:

# New Revenue Charts (sNewChart)
=[Campaign] & " - New Revenue ($)"
# Loss Revenue Charts (sLossChart)
=[Campaign] & " - Loss Revenue ($)"

I have now created three functions:

Three New Functions

Step 2. Add the Group Header.

To add the Campaign as the Group Header drag and drop the Campaign field from the Data tab onto the Group Header area and apply some formatting:

Group Header with Campaign

Step 3. Add the Charts.

Each chart needed to look similar to this mockup (I have already created the Chart Title and New_Week_Ending functions):

Chart Structure




Drag and drop a chart object Chart Icon from the panel located on the left hand side onto the Details Header section and double click the object to open up the Edit Chart (properties) window . To create a X-Y Chart which uses a TimeSeries Collector make sure you Edit Chart (properties) window looks like this:

Edit Chart Properties

After setting up all the options the Edit Chart (properties) window now look like this (for the New Revenue measure):

Chart Properties for New Revenue Measure

  • Title
    • chart-title-field: Field or function which is the chart title i.e. sNewChart or sLossChart
  • Primary DataSource
    • Common

      • category-time-option: Value of the x-axis, this must match the amount of series you have in the value-column option i.e. New_Week_Ending (twice)
      • time-period-type: Type of period the above field is i.e. Week and New_Week_Ending is an increment of weeks
      • value-column: Field(s) which contain the y-axis, you can specify more than one as there can be multiple series i.e. New_09 and New_10 or Loss_09 and Loss_10
    • Series
      • series-by-value: Name of the each series specified for the above option i.e. 2009 and 2010
    • Group
      • reset-group: Reset the charts after a specific group(s) i.e. Campaign

There was alot of extra formatting on the charts so be sure to check out the available .PRPT file if you interested in what they look like.

After I created the chart for New Revenue I just copied and pasted the chart object and made changes to the properties to create the Loss Revenue chart:

Loss Revenue Chart Properties

The final report structure looks like this:

Report Structure Complete

There are a lot of options in the charting library of PRD so I suggest you pick up Will Gorman’s book Pentaho Reporting 3.5 for Java Developers which has a couple of chapters which cover the charting library in detail.

Step 4. Add the Page Break.

The last step was to add a page break after each Campaign group. To do this click on the Details Body item under the Structure tab and check the pagebreak-after option:

Page Break

Output.

After previewing the report the final output as a PDF looks like this (you can download the actual PDF file below):

Preview

Downloads.

Filename Type Notes
ExcelDataSource PRPT Released 8/04/2010
ExcelDataSourceSampleData XLS Released 8/04/2010
ExcelDataSourcePreview PDF Released 8/04/2010

Final Notes.

This post is very detailed as it covers some interesting areas but as this was an ad hoc request it took me under 5 minutes to whip together this report and have it in the user’s inbox. Do you have any examples of how you handled ad hoc requests with PRD? If so, leave a comment or send me an email.

One user converted n to go! Enjoy.

Posted by Prashant Raju on Apr 8, 2010 — 17 CommentsShare

17 Comments for “Pentaho Report Designer and Excel Data Sources.”

  1. Posted by Farouk TANGAO on Apr 8, 2010

    Good job, I really appreciate it

  2. Posted by Rinaldo on Apr 8, 2010

    Godd job, thnak you Prashant

  3. Posted by Anthy on Apr 12, 2010

    Cool, top work!

  4. Posted by Diddy on Apr 13, 2010

    Excellent tutorial! Thanks.

  5. Posted by Belouali on Apr 14, 2010

    Good work men keep on

  6. Posted by Rafael Veloso on Apr 14, 2010

    Nice! Thanks for sharing!

  7. Posted by Sue on Apr 20, 2010

    Great Tutorial!! thanks. Do you have any sample as to how to use xaction in PRD 3.5 ?

    Thanks

  8. Posted by Prashant Raju on Apr 20, 2010

    Sue

    No I don’t however the samples that come with Pentaho have some examples of using xactions.

    Prashant

  9. Posted by Sathya on Jul 14, 2010

    Good post!
    however, my requirement is to create a single chart which will take in time period as a parameter i.e day, week, month, year, and output a Time-series chart based on the parameter.
    Is there any way I can dynamically choose the x-axis (time period type) ?
    any help is appreciated!
    Thanks!

  10. Posted by omar on Aug 30, 2010

    hi prashant ,
    i have a question about an displaying an arabic text ,
    when i generate the report in PDF format , i have it but in reverse like this :

    عمر in PDF i have رم ع

  11. Posted by Aleya on Sep 2, 2010

    Hi, I want to ask you for this things, I have a templat for PDF.. but now.. I want to use original templat in Pentaho.. but.. the format of the report is pdf.. how can I convert it to pentaho format like (*.prpt), can you give me the answer?

  12. Posted by Prashant Raju on Sep 2, 2010

    Aleya

    You will have to manually replicate the format in Pentaho Report Designer (PRD).

    There is no way (that I know off) to import the format of a pdf document into PRD.

    Prashant

  13. Posted by Aleya on Oct 12, 2010

    Thanks prashant, i just wondering it will be wonderful if pentaho have a function to identify pdf and we able to edit it by the original report.

  14. Posted by Sophie Ortega on Dec 24, 2010

    Good work men keep on

  15. Posted by sou on Jun 10, 2011

    Hi,

    I need to display arabic text (stored into sql server) in my pdf pentaho report.
    But I get something like that “ØáÈ ÇáÓíÏ ÇáãíáæÏ
    ÈæÌáÇÈ”.

    Could you please help
    Thanks

  16. Posted by MC on Oct 28, 2011

    Hello:

    I am desperately looking for some installation related support with Pentaho! Any guidance, directions will be greatly appreciated.

    Best. Moniza.

    It seems like the installation is successful, but when I was trying to insert the license, it was giving me an error message.

  17. Posted by Avinash Kumar on Jun 4, 2012

    Nice one.. Great help!!


Post a Comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>