Using Variables (?) with Table Input Steps.

Posted under Data Integration, Open Source, Pentaho, Tutorials

Using arguments with Pentaho Data Integration (PDI) transformations are extremely handy when trying to pass through dynamic information. In this tutorial I will show how you can implement variables (denoted by ?) within SQL statements using Table input steps. Along with using variables I also cover how to copy and get rows from the result (memory), how to use system information and debugging steps. This tutorial was done with version PDI 3.2.0.

Scenario.

I have two separate data sources, the first data source (ds01) has information about my employees i.e. employee code, office location, created date. My second data source (ds02) has satisfaction survey responses for each employee which was provided by an external company i.e. employee code, question title, response. I would like to retrieve for each employee the question and response from the satisfaction survey but only for employees who were created in the last month.

Flow.

The flow of the Jobs and Transformations which will make up this ETL process are:

  • Job : Process employee satisfaction
    • Transformation : Get employee details
      • Get system data : Get last months date
      • Table input : Get employee data
      • Copy rows to result
    • Transformation : Get employee satisfaction
      • Get rows from result
      • Table input : Get satisfaction data
      • Excel output : Store results

Data sets.

The first data set ds01 (employee data) looks like this:

emp_code created_on
100000 12/02/2010
100001 05/05/2010

The second data set ds02 (satisfaction data) looks like this:

emp_code question_title response
100000 Rate your experience with the company 8
100001 Rate your experience with the company 5

Step 1. Create the parent Job.

The first step is to create the parent Job, Process employee satisfaction. Once PDI has started to create a New Job you can either click on the File > New > Job option or use the CTRL + ALT + N shortcut.

Create a new job

Save the Job (using the File > Save option or CTRL+S shortcut) and name the file Process employee satisfaction.

Save the job

Step 2. Create the Shared Database connections.

The next step is to create two shared database connections one for ds01 the other for ds02. While the Process employee satisfaction job is open make sure the View tab is visible (in the left hand side pane) and right click on the Database connections item and select the New option.

Database connections > New

The Database Connection window will appear. You will then to fill out the available options for your database connection, as an example ds01 uses a MySQL database so my options after they were filled out looked like this:

Database Connection window, ds01

  • Connection Name : Name of your connection i.e. ds01
  • Host Name : Host of your database server i.e. localhost
  • Database Name : Name of the database which holds your information i.e. employees
  • Port Number : Port which your database is listening on i.e. 3306
  • Username : User to access the database i.e. root
  • Password : Password to access the database

For ds02 my connection options looked like this:

Database connection window, ds02

  • Connection Name : Name of your connection i.e. ds02
  • Host Name : Host of your database server i.e. localhost
  • Database Name : Name of the database which holds your information i.e. satisfaction
  • Port Number : Port which your database is listening on i.e. 3306
  • Username : User to access the database i.e. root
  • Password : Password to access the database

After filling out the required fields you should always test your connection by clicking the Test button, you should receive a success message similar to the one below:

Connection test successful

Repeat the steps above for the second database connection ds02. When you are done you should have both ds01 and ds02 database connections listed under the Database connections folder:

Database connections

The last step is to share both the database connections so that they can be used throughout any other jobs and transformations which are created. Right click on the ds01 database connection and click the Share option:

Sharing database connections

Repeat the above step for ds02, you should now see both database connections in bold:

Shared database connections

Now that the database connections have been created and shared save the Job and restart Kettle. In the following steps you will create the transformations.

It is important that you restart Kettle so that the shared database connections can be used within the transformations.

Step 3. Create the Transformations.

1. Get employee details.

The first transformation that you will need to create will require the following steps:

  1. Get System Data : Get last month date
  2. Table input : Get employee data
  3. Copy rows to result

The first step is to create the Transformation, Get employee information. To create a new Transformation you can either click on the File > New > Transformation option or use the CTRL + N shortcut.

New Transformation

Save the transformation and name it Get employee information.

Get employee information saved

The first step will retrieve last month’s date using the Get System Data step. Drag and drop the Get System Data (under Inputs) step from the left hand side onto the workspace and double click on it to edit the options. After making the changes the Get System Data step now look like this:

Get System Data options

After clicking the OK button the workspace now looks like this:

Get system data - workspace

The next step, Get employee data uses a Table input (under Inputs) step which also utilises the previous steps (Get last month date) value as a variable in the WHERE clause. Drag and drop the Table input step from the left hand side pane onto the workspace and create a Hop between the Get last month date and the Table input step.

To create Hops click on the first step, hold down the SHIFT button and drag the arrow to next step.

Table input and hop created

Double click on the Table input step to change the options, after making the changes the options now look like this:

Table input step options - Get employee data

  • Step name : Name of the step i.e. Get employee data
  • Connection : The connection which the step is using i.e. ds01
  • SQL : SQL statement used to retrieve the information, in this example we are retrieving employee codes who were created in the last month (which has been replaced with a ? to make it dynamic)
  • Replace variables in script : Checked, we are using variables from the previous step which I have represented with a ? in the SQL statement
  • Insert data from step : Selected the previous step, Get last month date as this contains the variable needed in the SQL statement

You can read more into how to use variables in Table input steps on the Pentaho Wiki.

After clicking the OK button the workspace now looks like this:

Workspace after creating table input step

You can make sure that the variable replacement is working you can preview the output of the Get employee data step by clicking on the step and then the Preview icon preview (icon) or right clicking on the step and selecting the Preview option. This will open up a Transformation debug dialog window:

Debug dialog




Click the step which you would like to preview which in this example is Get employee data and click the Quick launch button, you should now be shown a preview of the data set from the Get employee data step:

Preview

The last step is a Copy rows from result step, this will copy the output from the previous step Get employee data to the result (memory). Drag and drop the Copy rows from result step (under Job) from the left hand side pane on to the workspace and create a Hop between the Get employee data and the Get rows from result step.

Copy rows to result options

Save the transformation.

2. Get employee satisfaction.

The second transformation you will need to create will require the following steps:

  1. Get rows from result
  2. Table input : Get satisfaction data
  3. Excel output

The first step is to create the Transformation, Get employee information. To create a new Transformation you can either click on the File > New > Transformation option or use the CTRL + N shortcut.

New Transformation

Save the Transformation and name it Get employee satisfaction.

Get employee satisfaction

The first step will retrieve the data which is stored in the result (memory) by the Copy rows to result step in the previous transformation using the Get rows from result step. Drag and drop the Get rows from result (under Jobs) step from the left hand side onto the workspace and double click on it to edit the options. After making the changes to the Get rows from result step the options now look like this:

Get rows from result options

Make sure you enter the exact name of the fields that you copied to the result which in this example was from the Table input step Get employee data, these were named emp_code.

The next step Get satisfaction data will be use Table input (under Scripts) step which will utilise the values of the previous step, Get rows from result in the WHERE clause. Drag and drop the Table input step from the left hand side pane onto the workspace and create a Hop between the Get rows from result and the Table input step:

Get rows from result -> Table input

Double click on the Table input step to change the options, after making the changes the options now look like this:

Get satisfaction data

  • Step name : Name of the step i.e. Get satisfaction data
  • Connection : The connection which the step is using i.e. ds02
  • SQL : SQL statement used to retrieve the information, in this example we are retrieving employee satisfaction results for each employee code (which has been replaced with a ? to make it dynamic) stored in the result (memory) from the Get rows from result step
  • Replace variables in script : Checked, we are using variables from the previous step which I have represented with a ? in the SQL statement
  • Insert data from step : Selected the previous step, Get rows from result as this contains the variable needed in the SQL statement
  • Execute for each row? : Check, in this example we need to run this query for each row stored in the result (memory)

Click OK your workspace will now look like this:

Get satisfaction data

The last step is to create an Excel Output step which will store the results from the Get satisfaction data step. Drag and drop the Excel Output step from the left hand side pane onto the workspace and create a Hop between the Get satisfaction data and Excel Output step.

Excel output workspace

Double click on the Excel Output step to change the options, after making the changes the options now look like this:

Excel output options

  • File tab
    • Filename : Name and location of where the Excel file will be stored i.e. /Users/prashantraju/Desktop/result
    • Extension : Extension of the file i.e. .xls
  • Content tab
    • Append : Append the results to the file, in this example the output will run for each row in the result (memory) so you will need to append the results to the file otherwise you will only see the last row from the ETL process
    • Header : Include the header in the output, in this example that is emp_code, emp_name, question_title and response
  • Fields tab
    • The fields which will be outputted into the Excel sheet, in this example it is only emp_code, question_title and response

Click OK to save the step and also save the Transformation.

Step 4. Complete the Job.

The last and final step is to complete the Job which we created in Step 1, the job will require four steps:

  1. START
  2. Transformation: Get employee information
  3. Transformation: Get satisfaction information
  4. Success

Open the Process employee satisfaction job and drag and drop a START step (under General), two Transformation steps (under General) and a Success step (also under General) onto the workspace and create a Hop between each of them so that your workspace now looks like this:




Job workflow

You will need to make changes to both Transformation steps so that they point to the Get employee information and Get employee satisfaction transformations which were created in Step 3. Double click on the first transformation and change the options so that they now look like this:

Get employee information - Transformation

Click OK to save the step.

Repeat the above steps but for the Get employee satisfaction step.

Get satisfaction information

Click OK to save the step and also save the Job. Now you are ready to test the Job.

Step 5. Test the Job.

To run the ETL process make sure you have the Process employee satisfaction job open and click on the Run icon (run) icon to run the Job, this will display a Execute a job window after changing the options i.e. log detail etc:

Execute a job

After clicking the Launch button and opening the result.xls file on my Desktop the output shows for each employee code the question and response:

Excel output

Downloads.

Filename Type Notes
Process employee satisfaction KJB Released 11/06/2010
Get employee information KTR Released 11/06/2010
Get employee satisfaction KTR Released 11/06/2010
SampleData SQL Released 11/06/2010

Final Notes.

This example may not be the best way to represent the use of Table inputs and variables but hopefully the detail and the inclusion of storing rows in results, using system variables and debugging steps can help you with your PDI learning curve (not that that there is a big one anyway!).

As always, enjoy!

Posted by Prashant Raju on Jun 12, 2010 — 9 CommentsShare

9 Comments for “Using Variables (?) with Table Input Steps.”

  1. Posted by fherdi on Jul 24, 2010

    Excelente ejemplo, me costo mas de 3 semanas en saberlo y ahora se de tu pagina caray excelente :-) seguire consultandola y muchas gracias por tu aportación espero poder aprender, saludos.

  2. Posted by Vaidehi on Sep 10, 2010

    Great article prashant!
    Such a handy tip to restart Kettle after sharing DB connections!

    I am trying to dynamically generate the SQL by passing information to the ‘Table Input’ plugin.
    E.g:
    select * from {$SQLSTATEMENT}
    I tried defining SQLSTATEMENT as a variable, argument, parameter but nothing works.

    Any suggestions?

  3. Posted by Prashant Raju on Sep 10, 2010

    Vaidehi

    Have you made sure that you checked the “Replace variables” option in the Table Input step.

    Prashant

  4. Posted by Aviral Sharma on Oct 27, 2010

    Great Job !!
    Very well explained with screenshots.

  5. Posted by Gueye on Apr 30, 2011

    Thank you very much.I was my problem!!!

  6. Posted by Lynn Yarbrough on May 10, 2011

    Great Job as always

  7. Posted by pbno on Aug 23, 2011

    Is there any way to use two Database connections in a transformation, without creating a job??

  8. Posted by PP on Sep 17, 2011

    HOW TO PASS MULTIPLE VALUES TO DYNAMIC SQL IN WHERE CLAUSE NOT THE ONLY ONE.. I HAVE SCENARIO WHERE I NEED TO PASS TO VALUES TO IT..CAN YOU PLEASE HELP

  9. Posted by muscle maximizer login on Jan 19, 2013

    I am really enjoying the theme/design of your website.

    Do you ever run into any browser compatibility issues? A small number of my
    blog visitors have complained about my blog not working correctly in Explorer but looks great in Chrome.
    Do you have any tips to help fix this issue?

    Get into Kyle Leon, along with the Somanabolic Muscle Maximizer;
    the product brings together the two problems completely and offers a personalized healthy guidebook for your body
    kind (Somatatype). Therefore that your training will be backed up
    by the right nutrients that your body demands, provided when your
    body needs that. This can speed up results, over a prepare that
    does not get your system type under consideration.

    This may also lessen muscle mass low energy, or perhaps painful throughout recuperation.

    Check out my webpage: muscle maximizer login


Post a Comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>