Performance

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
Lindies
Participant
Posts: 37
Joined: Thu Aug 05, 2004 5:11 am
Contact:

Performance

Post by Lindies »

Hi

Please suggest a way to make the job run faster on EE.This is the sql on a server job.Our database is Sybase IQ.SELECT
MAX(KEY) KEY,
Num,
case when substring(trim(col1),1,locate(trim(col1),' ',1)-1)='0' then ''
else substring(trim(col1),1,locate(trim(col1),' ',1)-1) end||' '||
case when substring(trim(col1),(locate(trim(col1),' ',-1))+1,length(trim(col1)))='0' then ''
else substring(trim(col1),(locate(trim(col1),' ',-1))+1,length(trim(col1))) end
AS col1
FROM table
GROUP BY Num, col1


I created a view to do the following:

key col1
-------------- ----------------
529193430 130076 6385 272
529193431 130076 6355 272
529193432 130076 5355 272
529193433 130076 5345 272
473221279 172808 555555555
481475905 173338 555555555
494442315 174095 555555555

results

key col1
-------------- ----------------
529193433 130076 272
473221279 172808 555555555
494442315 174095 555555555
481475905 173338 555555555
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Define "slow". And have you determined where the bottleneck is?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

How long does it take outside datastage ?

You appear to be trimming leading and trailing zero. Can't you use trim function itself for that ?

Or something like

Code: Select all

SubStr(trim(col1), 1, 2) = '0 ' 
SubStr(trim(col1), -1, -2) = ' 0'
I am not a Sybase expert. But you can try preparing an 'inner query' and rewrite that so you avoid tirm, substr etc multiple times.
Lindies
Participant
Posts: 37
Joined: Thu Aug 05, 2004 5:11 am
Contact:

Post by Lindies »

The existing job that need improvement does selects from a view that is created in Sybase IQ and the job takes 2 hours.The view does the whole removing the middle number in col1 and get the max (num) to remove dumplicate.


Doing this in datastage i first use stage variables to remove the middle number in col1 and the put the data in aggreagor stage to get the max and group by num and col1.It takes 4 to 6 hours.

My task is to increase performance now that we use EE bt the more i try to find new ways the slower if gets..I am even starting to think 2 hours s the best perfomance for rows 529193388.

Thanks for ur replies
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What happens if you double the number of processing nodes?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Lindies
Participant
Posts: 37
Joined: Thu Aug 05, 2004 5:11 am
Contact:

Post by Lindies »

Thanks,

I will try that now and reply when the job is done.
Lindies
Participant
Posts: 37
Joined: Thu Aug 05, 2004 5:11 am
Contact:

Post by Lindies »

Tried using 6 nodes,other developers couldnt log on to Datastage Designer and was told not to increase the nodes ever again. :( I will keep on trying other ways.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So, one job running on 6 nodes brought your server to its knees? And 2 hours to process half a billion records is 'poor performance'? Me, I would be happy with the 2 hours, especially considering the amount of hardware you (apparently) have to throw at the problem. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Lindies
Participant
Posts: 37
Joined: Thu Aug 05, 2004 5:11 am
Contact:

Post by Lindies »

Thank you,i needed a second person to say this,i told our developers that this job is actually designed very well and is running fast for these records.Most people who complain about poor perfomance in Dsxchange they are talking less than 80 000 000 with simple tranformation.My case is even worse getting max value and grouping by.


I agree 100%.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Simple economics: supply and demand. Did 6 nodes make an improvement in elapsed time? Maybe you can schedule this job when there are no developers? Anyway in a production environment there should not be any developers, so you would throw as many resources at the job as the system can deliver comfortably.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

or you may try to find the optimal number of nodes by running it multiple times, if the system is running out of resources on 6 nodes you may want to try 4 nodes.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
Lindies
Participant
Posts: 37
Joined: Thu Aug 05, 2004 5:11 am
Contact:

Post by Lindies »

Our default nodes is 4.A senior developer who worked on EE longer than me will be helping me.I still doubt perfomance can be increase.I will post the results as soon we done.

Thanks
Post Reply