Tranformer stage proframance is not good.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
suresh_dsx
Participant
Posts: 160
Joined: Tue May 02, 2006 7:49 am

Tranformer stage proframance is not good.

Post by suresh_dsx »

Hi all,

As i got information from the forms ODBC stage not good for loading bulk data.

Data stage job running 10 hrs for 30 million records. i want to increase the performance of the job.
Below i have tried the possibility options from the forms to increase the performance but still talking lot of time.

Job name: Customer_job, The job contains 4 jobs.
Note: it is not a sequencer. it is a parallel job. With in the parallel job i have 4 jobs.

Code: Select all

Job design

Sequential file stage1-->ODBC stage1(Table1)

Sequential file stage2-->ODBC stage2(Table2)

Sequential file stage3-->ODBC stage3(Table3)

Sequential file stage4-->ODBC stage4(Table4)

Code: Select all

Seq file stage Properties:

Number of Readers per node -->2
Delimeter -->comma
Quote     -->double


ODBC Stage properties

Write method --> write
Write mode   -->   Truncate
Insert array size -->2000
partition type -->random
As i got information from the forms odbc stage is not that many features like native database stages.
My Database-->SQL server. we are using only ODBC stage.


I have tried possibilities to increase the performance. But still taking 10 hours to run the job.

1. Added in the source Number of Readers per node -->2
2. Increased Insert array size is 2000 in the ODBC stage.
3. Added partition type is random

4. I have spitted the job into 4 individual jobs and executed the jobs. It is taking same amout of time.
Any help greatly appreciated.


Thanks
Suri.
Last edited by suresh_dsx on Wed Aug 04, 2010 6:28 am, edited 2 times in total.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Fixed your... rather strange quoting of yourself for you. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I would increase the commit frequency, doing this every 2000 rows means you are doing 30000000/2000 = 15,000 commits. If your database has sufficient space, consider upping this number significantly.

Just for comparison, how long does the job run if you replace the ODBC write stages with a PEEK or COPY stage?
suresh_dsx
Participant
Posts: 160
Joined: Tue May 02, 2006 7:49 am

Post by suresh_dsx »

Thanks ArndW for suggestion.

Now the job performance is good.

Job name: Cust_information
Job design

Sequential file stage1-->copy stage-->ODBC stage1(Table1)
Sequential file stage2-->copy stage-->ODBC stage2(Table2)
Sequential file stage3-->copy stage-->ODBC stage3(Table3)
Sequential file stage4-->copy stage-->ODBC stage4(Table4)



I have like this many jobs. One of the jobs are having below design.

Job name: Product_information

Code: Select all

Job design -->The product_information parallel job contains 4 jobs.

Sequential file stage1-->Transformer stage-->copy stage-->ODBC stage1(Table1) 
Sequential file stage2-->copy stage-->ODBC stage2(Table2) 
Sequential file stage3-->copy stage-->ODBC stage3(Table3) 
Sequential file stage4-->copy stage-->ODBC stage4(Table4)

The issue with the transformer stage. Total execution time for the job is 8hrs.Toal source data is 20 million records.
Transformer stage contains the constraint. Many functions are involved in the in the constraint like Trim, NulltoEmpty, up case. I am not sure this is the reason; job is execution is very slow.

Code: Select all

(Trim(NullToEmpty(lnkRes.newvalue)) = '' Or UpCase(Trim(lnkRes.newvalue,' ','A')) = UpCase(Trim(prmnewvalue,' ','A')) Or Trim(prmnewvalue) = '') And 
(Trim(NullToEmpty(lnkRes.oldvalue)) = '' Or UpCase(Trim(lnkRes.oldvalue,' ','A')) = UpCase(Trim(prmoldvalue,' ','A')) Or Trim(prmoldvalue) = '')
Finally, I am unable to tune this job when the transformer stage present.
Any help greatly appricated.
Thanks
Suri
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Move all your TRIM(link.column) to stage varaibles, that way each distinct TRIM() is only executed once and not several times as is now the case. That will reduce the CPU load somewhat.

During execution, turn on the monitor and display the %CPU -> if the value is anywhere consistently below 80% then your job is not being limited by the transform stage.
Post Reply