Need performace improvement - Help plz

DSDexter · Post by **DSDexter** » Sat Oct 04, 2008 6:58 am

Hello,

My job reads from Oracle DB, does the data validation and then loads it into oracle table. There is also a reject link to catch the data validation errors which is again directed to a oracle table. I have to process 5 million records.

I am using Oracle enterprise stage @ source. I am using the partition table property. The read is slow because i guess the transformer is not able to process the incoming data that efficeintly. I check for validation of almost 10 decimal,10 integer,5 dates and 1 timestamp columns (Using the IsValid functon) and I think this is causing the bottleneck. I am not able to understand why transformer is working so slow.The job takes almost 2 hours

to complete. Any thougths on this?

ArndW · Post by **ArndW** » Sat Oct 04, 2008 7:17 am

Replace your output load to an oracle stage with either a sequential file going to /dev/null or a copy stage with no output link. Re-run your job. How long does it take?
How many partitions does your database have and how many nodes are you running your PX job with?

DSDexter · Post by **DSDexter** » Sat Oct 04, 2008 7:40 am

Andrw,

Thanks for the reply.

I forgot to mention this statistics. I have doen this earlier.I used a copy stage. So now my job looks like Entrprise stage and copy stage.

I read approx. 22000 rows/sec.

Secondly I use the mod function on one of the numeric col with four way stages. mod(col,4)=0...and so on. Now my job looks like four Enterprise stages and four copy stages along with the partition table property.

I see a amazing 45K + rows/sec. But the moment i use a transformer,bang the count goes down to less than 1000 rows/sec. I even tried four oracle stages connected to four transformers followed by copy stage... So improvement but still not satisfactory.

I could think of this strategies...Am i missing something?

The details you need.

My job runs on

Sun Solaris
Currently on two nodes.
The tables are not partitioned.

DSDexter · Post by **DSDexter** » Sat Oct 04, 2008 7:41 am

Andrw,

Thanks for the reply.

I forgot to mention this statistics. I have doen this earlier.I used a copy stage. So now my job looks like Entrprise stage and copy stage.

I read approx. 22000 rows/sec.

Secondly I use the mod function on one of the numeric col with four way stages. mod(col,4)=0...and so on. Now my job looks like four Enterprise stages and four copy stages along with the partition table property.

I see a amazing 45K + rows/sec once i funnel all the records from the oracle stages.But the moment i use a transformer,bang the count goes down to less than 1000 rows/sec. I even tried four oracle stages connected to four transformers followed by copy stage... So improvement but still not satisfactory.

I could think of this strategies...Am i missing something?

The details you need.

My job runs on

Sun Solaris
Currently on two nodes.
The tables are not partitioned.

ArndW · Post by **ArndW** » Sat Oct 04, 2008 9:02 am

If you monitor your system (vmstat is common to all UNIX implementations, so use that) , does the %CPU used consistenly go over 90% with samples of 10 seconds when using the MOD() function? How many CPUs does your system have? I would think that just one transform stage but more levels of parallelism would be more efficient than splitting transform stages.

DSDexter · Post by **DSDexter** » Sat Oct 04, 2008 11:43 am

Andrw,

As you suspected the the cpu utilization is more than 90% most of the time. I am pasting the scrren shot of vmstat

Code: Select all

 procs     memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s1 s3   in   sy     cs     us sy id
 1 0 0 49556856 3932552 205 857 543 116 117 0 1 0 18 18 0 5660 7855  2598   25 14 0
 4 0 0 44076344 1154568 24 158 2 236 236 0 0 0 25 25 0 119321 18240  9771   94 6 0
 8 0 0 44076400 1155512 68 321 0 108 108 0 0 0 9  9  0 122518 26977  9617   93 7 0
 6 0 0 44072408 1152008 147 992 0 57 57 0 0 0  4  4  0 126174 20447  10104  93 7 0
 9 0 0 44066176 1146360 73 477 92 118 118 0 0 0 30 30 0 144170 30090 10196  91 9 0
 7 0 0 44068872 1148144 30 138 0 102 102 0 0 0 7  7  0 134965 22387  12581  94 6 0

The cpu utilization is well over 90% most of the time abd also I noticed that the CPU idle time is always zero.

In the current environment where i am running the jobs, I have two nodes with 4 CPU's.

Do I need to add more CPU's? (Looking at the CPU utilization and idle time figures.). Please suggest. Also is there anything else I need to check?

Thanks

ArndW · Post by **ArndW** » Sun Oct 05, 2008 3:00 am

You choices are limited - reduce your CPU load or get more CPUs. What does vmstat show if you replace just your 4 MOD() functions with simple derivations and re-run your job? I would be surprised if just these 4 operations were consuming all that CPU time.

DSDexter · Post by **DSDexter** » Sun Oct 05, 2008 4:31 am

Without the 4 MOD() functions the CPU utilization is still close to 90% for the entire time the job runs. Andrw, when I monitor the statistics for transformer, The %CPU usage is very high (168). I think the transformer is too loaded so I added another transformer which carried few data validations. What I am trying to achive here is load balance between the two transformers in terms of the operations they are carrying out. I think this should work for me. Any thoughts for this pointer?

Also please suggest me, which of the following to codes will work more efficently.

Code: Select all

IF
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.NUM_UNITS))) and
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.CS))) and
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.ORIG_TERM))) and
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.ORIG_LTV))) and
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.ORIG_NOTE_RATE))) and
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.POOL_ISS_NOTE_RATE))) and
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.AUX_NOTE_RATE))) and
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.NET_NOTE_RATE))) and
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.ORIG_LOAN_AMT))) and
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.POOL_ISS_RPB))) and
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.CURR_RPB))) and
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.LOAN_AGE)))  and
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.REM_TERM)))  and
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.MONTHS_TO_AMORT)))  
THEN
 '0'
ELSE
 '-01722'

Or
Just a single

Code: Select all

IF
 IsValid('DECIMAL[38,10]',TrimLeadingTrailing(NullToZero(LNK_Read_Input2.MONTHS_TO_AMORT)))  
THEN
 '0'
ELSE
 '-01722'

But for all the columns individually?

Andrw I really appreciate your promt replies...

ArndW · Post by **ArndW** » Sun Oct 05, 2008 6:12 am

Going to very high CPU is not always a bad thing - it means that you haven't spent a lot of wasted money on processing capacity.
The sample code you've listed is going to chew up a lot CPU time, it can be optimized quit a bit.

As I thought, the MOD() function is not the culprit. Are you doing the MOD on a string which must be converted each time, or is the colum already an integer?

I assume that the data types for these columns is VarChar. Can you not put a handler on your input to make the default null-value "0"; this will be more efficient than doing it multiple times in a transform stage.

Try converting your logic to "OR" instead of "AND". In your case, ALL of the conditions need to be evaluated each time. If your data contains values where the string is not a valid decimal, using "OR" and looking for not(IsValid()) is going to be faster, particularly if you can order it so that the column with the most invalid values gets checked first.

Can you read these columns as DECIMAL from your source directly, that would save a lot of CPU cycles for conversion testing and also make the data volume transferred between stages smaller.

Nagaraj · Post by **Nagaraj** » Mon Oct 20, 2008 5:50 pm

Thanks andrew, but did we get any solution for this assuming we dont have more cpu's to add.

Nagaraj · Post by **Nagaraj** » Mon Oct 20, 2008 5:52 pm

Thanks andrew, but did we get any solution for this assuming we dont have more cpu's to add.