Using arguments with Pentaho Data Integration (PDI) transformations are extremely handy when trying to pass through dynamic information. In this tutorial I will show how you can implement variables (denoted by ?) within SQL statements using Table input steps. Along with using variables I also cover how to copy and get rows from the result (memory), how to use system information and debugging steps. This tutorial was done with version PDI 3.2.0.
Scenario.
I have two separate data sources, the first data source (ds01) has information about my employees i.e. employee code, office location, created date. My second data source (ds02) has satisfaction survey responses for each employee which was provided by an external company i.e. employee code, question title, response. I would like to retrieve for each employee the question and response from the satisfaction survey but only for employees who were created in the last month.
Flow.
The flow of the Jobs and Transformations which will make up this ETL process are:
- Job : Process employee satisfaction
- Transformation : Get employee details
- Get system data : Get last months date
- Table input : Get employee data
- Copy rows to result
- Transformation : Get employee satisfaction
- Get rows from result
- Table input : Get satisfaction data
- Excel output : Store results
- Transformation : Get employee details
Data sets.
The first data set ds01 (employee data) looks like this:
| emp_code | created_on |
|---|---|
| 100000 | 12/02/2010 |
| 100001 | 05/05/2010 |
The second data set ds02 (satisfaction data) looks like this:
| emp_code | question_title | response |
|---|---|---|
| 100000 | Rate your experience with the company | 8 |
| 100001 | Rate your experience with the company | 5 |
Step 1. Create the parent Job.
The first step is to create the parent Job, Process employee satisfaction. Once PDI has started to create a New Job you can either click on the File > New > Job option or use the CTRL + ALT + N shortcut.

Save the Job (using the File > Save option or CTRL+S shortcut) and name the file Process employee satisfaction.

Step 2. Create the Shared Database connections.
The next step is to create two shared database connections one for ds01 the other for ds02. While the Process employee satisfaction job is open make sure the View tab is visible (in the left hand side pane) and right click on the Database connections item and select the New option.
The Database Connection window will appear. You will then to fill out the available options for your database connection, as an example ds01 uses a MySQL database so my options after they were filled out looked like this:
- Connection Name : Name of your connection i.e. ds01
- Host Name : Host of your database server i.e. localhost
- Database Name : Name of the database which holds your information i.e. employees
- Port Number : Port which your database is listening on i.e. 3306
- Username : User to access the database i.e. root
- Password : Password to access the database
For ds02 my connection options looked like this:
- Connection Name : Name of your connection i.e. ds02
- Host Name : Host of your database server i.e. localhost
- Database Name : Name of the database which holds your information i.e. satisfaction
- Port Number : Port which your database is listening on i.e. 3306
- Username : User to access the database i.e. root
- Password : Password to access the database
After filling out the required fields you should always test your connection by clicking the Test button, you should receive a success message similar to the one below:

Repeat the steps above for the second database connection ds02. When you are done you should have both ds01 and ds02 database connections listed under the Database connections folder:

The last step is to share both the database connections so that they can be used throughout any other jobs and transformations which are created. Right click on the ds01 database connection and click the Share option:

Repeat the above step for ds02, you should now see both database connections in bold:

Now that the database connections have been created and shared save the Job and restart Kettle. In the following steps you will create the transformations.
It is important that you restart Kettle so that the shared database connections can be used within the transformations.
Step 3. Create the Transformations.
1. Get employee details.
The first transformation that you will need to create will require the following steps:
- Get System Data : Get last month date
- Table input : Get employee data
- Copy rows to result
The first step is to create the Transformation, Get employee information. To create a new Transformation you can either click on the File > New > Transformation option or use the CTRL + N shortcut.

Save the transformation and name it Get employee information.

The first step will retrieve last month’s date using the Get System Data step. Drag and drop the Get System Data (under Inputs) step from the left hand side onto the workspace and double click on it to edit the options. After making the changes the Get System Data step now look like this:
After clicking the OK button the workspace now looks like this:

The next step, Get employee data uses a Table input (under Inputs) step which also utilises the previous steps (Get last month date) value as a variable in the WHERE clause. Drag and drop the Table input step from the left hand side pane onto the workspace and create a Hop between the Get last month date and the Table input step.
To create Hops click on the first step, hold down the SHIFT button and drag the arrow to next step.

Double click on the Table input step to change the options, after making the changes the options now look like this:
- Step name : Name of the step i.e. Get employee data
- Connection : The connection which the step is using i.e. ds01
- SQL : SQL statement used to retrieve the information, in this example we are retrieving employee codes who were created in the last month (which has been replaced with a ? to make it dynamic)
- Replace variables in script : Checked, we are using variables from the previous step which I have represented with a ? in the SQL statement
- Insert data from step : Selected the previous step, Get last month date as this contains the variable needed in the SQL statement
You can read more into how to use variables in Table input steps on the Pentaho Wiki.
After clicking the OK button the workspace now looks like this:

You can make sure that the variable replacement is working you can preview the output of the Get employee data step by clicking on the step and then the
preview (icon) or right clicking on the step and selecting the Preview option. This will open up a Transformation debug dialog window:
Click the step which you would like to preview which in this example is Get employee data and click the Quick launch button, you should now be shown a preview of the data set from the Get employee data step:

The last step is a Copy rows from result step, this will copy the output from the previous step Get employee data to the result (memory). Drag and drop the Copy rows from result step (under Job) from the left hand side pane on to the workspace and create a Hop between the Get employee data and the Get rows from result step.
Save the transformation.
2. Get employee satisfaction.
The second transformation you will need to create will require the following steps:
- Get rows from result
- Table input : Get satisfaction data
- Excel output
The first step is to create the Transformation, Get employee information. To create a new Transformation you can either click on the File > New > Transformation option or use the CTRL + N shortcut.

Save the Transformation and name it Get employee satisfaction.

The first step will retrieve the data which is stored in the result (memory) by the Copy rows to result step in the previous transformation using the Get rows from result step. Drag and drop the Get rows from result (under Jobs) step from the left hand side onto the workspace and double click on it to edit the options. After making the changes to the Get rows from result step the options now look like this:
Make sure you enter the exact name of the fields that you copied to the result which in this example was from the Table input step Get employee data, these were named emp_code.
The next step Get satisfaction data will be use Table input (under Scripts) step which will utilise the values of the previous step, Get rows from result in the WHERE clause. Drag and drop the Table input step from the left hand side pane onto the workspace and create a Hop between the Get rows from result and the Table input step:

Double click on the Table input step to change the options, after making the changes the options now look like this:
- Step name : Name of the step i.e. Get satisfaction data
- Connection : The connection which the step is using i.e. ds02
- SQL : SQL statement used to retrieve the information, in this example we are retrieving employee satisfaction results for each employee code (which has been replaced with a ? to make it dynamic) stored in the result (memory) from the Get rows from result step
- Replace variables in script : Checked, we are using variables from the previous step which I have represented with a ? in the SQL statement
- Insert data from step : Selected the previous step, Get rows from result as this contains the variable needed in the SQL statement
- Execute for each row? : Check, in this example we need to run this query for each row stored in the result (memory)
Click OK your workspace will now look like this:

The last step is to create an Excel Output step which will store the results from the Get satisfaction data step. Drag and drop the Excel Output step from the left hand side pane onto the workspace and create a Hop between the Get satisfaction data and Excel Output step.

Double click on the Excel Output step to change the options, after making the changes the options now look like this:
- File tab
- Filename : Name and location of where the Excel file will be stored i.e. /Users/prashantraju/Desktop/result
- Extension : Extension of the file i.e. .xls
- Content tab
- Append : Append the results to the file, in this example the output will run for each row in the result (memory) so you will need to append the results to the file otherwise you will only see the last row from the ETL process
- Header : Include the header in the output, in this example that is emp_code, emp_name, question_title and response
- Fields tab
- The fields which will be outputted into the Excel sheet, in this example it is only emp_code, question_title and response
Click OK to save the step and also save the Transformation.
Step 4. Complete the Job.
The last and final step is to complete the Job which we created in Step 1, the job will require four steps:
- START
- Transformation: Get employee information
- Transformation: Get satisfaction information
- Success
Open the Process employee satisfaction job and drag and drop a START step (under General), two Transformation steps (under General) and a Success step (also under General) onto the workspace and create a Hop between each of them so that your workspace now looks like this:
You will need to make changes to both Transformation steps so that they point to the Get employee information and Get employee satisfaction transformations which were created in Step 3. Double click on the first transformation and change the options so that they now look like this:
Click OK to save the step.
Repeat the above steps but for the Get employee satisfaction step.
Click OK to save the step and also save the Job. Now you are ready to test the Job.
Step 5. Test the Job.
To run the ETL process make sure you have the Process employee satisfaction job open and click on the
(run) icon to run the Job, this will display a Execute a job window after changing the options i.e. log detail etc:
After clicking the Launch button and opening the result.xls file on my Desktop the output shows for each employee code the question and response:
Downloads.
| Filename | Type | Notes |
|---|---|---|
| Process employee satisfaction | KJB | Released 11/06/2010 |
| Get employee information | KTR | Released 11/06/2010 |
| Get employee satisfaction | KTR | Released 11/06/2010 |
| SampleData | SQL | Released 11/06/2010 |
Final Notes.
This example may not be the best way to represent the use of Table inputs and variables but hopefully the detail and the inclusion of storing rows in results, using system variables and debugging steps can help you with your PDI learning curve (not that that there is a big one anyway!).
As always, enjoy!















Posted by fherdi on Jul 24, 2010
Excelente ejemplo, me costo mas de 3 semanas en saberlo y ahora se de tu pagina caray excelente
seguire consultandola y muchas gracias por tu aportación espero poder aprender, saludos.
Posted by Vaidehi on Sep 10, 2010
Great article prashant!
Such a handy tip to restart Kettle after sharing DB connections!
I am trying to dynamically generate the SQL by passing information to the ‘Table Input’ plugin.
E.g:
select * from {$SQLSTATEMENT}
I tried defining SQLSTATEMENT as a variable, argument, parameter but nothing works.
Any suggestions?
Posted by Prashant Raju on Sep 10, 2010
Vaidehi
Have you made sure that you checked the “Replace variables” option in the Table Input step.
Prashant
Posted by Aviral Sharma on Oct 27, 2010
Great Job !!
Very well explained with screenshots.
Posted by Gueye on Apr 30, 2011
Thank you very much.I was my problem!!!
Posted by Lynn Yarbrough on May 10, 2011
Great Job as always