DSXchange

Posted: **Fri Mar 13, 2009 10:01 am**

Hi Guys

I did a parallel job which reads data from two tables residing in an Oracle 8 platform. I used the Dynamic RDBMS stage to read the two tables. Table 1 has around 2.5 Million records (15 columns) and table 2 has around 5.3 million records (10 columns). I use a lookup stage(Auto partition) to accomplish the lookup logic. I was not sure if a join would help improve some performance. After lookup I have a transformer stage to do some business transformation, this transformer runs sequentially in a parallel job since I have logic to generate sequential numbers like (1,2,3,4.....) for one of the column. and finally I write the data into a flat file. The job runs around 18 minutes to process 5.1 millions records from the two sources to one file.

Is there anyway I can tweak my job and reduce the runtime to less than 18 mins. I am hoping I find a way to bring it down to 10 mins at the most. Please advice.

Thanks

Posted: **Fri Mar 13, 2009 10:07 am**

Replace the lookup with the join, make sure you sort and hash partition on the joining keys.
Lose the sequential mode in the transformer that generates the running numbers and use vmcburney's post in FAQ to generate running numbers in parallel.

Posted: **Fri Mar 13, 2009 10:10 am**

Couple of options --

1.You can try joining the tables in the same query using better indexing (test on toad first if you are using it).

2. Try using join stage if your reference data is more than source.

3. Not sure if you can use surrogate key stage to generate your sequence number.

4. Use hash partition if you want to use Lookup stage.

Did you actually check what is taking more time in your job ?

Posted: **Fri Mar 13, 2009 11:23 am**

Ok I have used the Join stage (hash partitioned sorted on key fields) and made the transformer to parallel default per vmcburney's logic of generating surrogate key. The run time drops down to 15' from 18'30". The data speed reduces when moving from Join stage to the transformer and then to the file. In the transformer I have null handling logic, couple of concatenations and couple of timestamp conversions, total columns processed in the Transformer are around 70. Will this be the best time I can get for my job.

DSXchange

Performance Tuning needed in my Job

Performance Tuning needed in my Job