JOb Performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sshettar
Premium Member
Premium Member
Posts: 264
Joined: Thu Nov 30, 2006 10:37 am

JOb Performance

Post by sshettar »

Hi All,

I have this one job which is currently using like 11 join stages to load a dataset. It is joining like 14 tables .Well the job needs those many join stages though cause each join is based on the previous join results and the join keys are also based on the result of previous join stage.
The jobs out here though are in parallel they are currently using just 1 node, but going forward they are wanting to use 4 nodes .
The jobs currently have auto as the partitioning method for all the join stages and no sorting done.
Since we are now planning for 4 nodes we need to sort the data and partition it correctly . but i am concerned as i would need to sort the data and hash partition them at each join stage due to the keys being different . But if i do so i'm thinking that would hit the performance .
If it does, any hint on what can i do to make the performance better.
Will i have to sort it and partition it at each join stage

Any help is highly appreciated

Thanks in advance
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

Yes, as part of natural course, your data will need to be sorted and partitioned appropriately - that's always true.

Consider, if volumes are amenable to it, the use of a lookup instead of a join. Given the number of stages in this process, I'd also be keen to look at splitting the process down and landing datasets at intermediate points instead.

Oh, and... testing, testing, testing :D
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
sshettar
Premium Member
Premium Member
Posts: 264
Joined: Thu Nov 30, 2006 10:37 am

Post by sshettar »

Thanks Mark!!!!
will try and use lookup stage ( where ever possible)
arnabdey
Participant
Posts: 50
Joined: Wed Jan 10, 2007 5:56 am

Post by arnabdey »

Join itself does a sorting of the incoming data in both the links. Using so many lookups in cascade may also degrade performance even if the volume of data in reference dataset is low. So I feel best thing is to break up the job into two with 5-6 lookups in each and use datasets as intermediate storage.
Arnab
sshettar
Premium Member
Premium Member
Posts: 264
Joined: Thu Nov 30, 2006 10:37 am

Post by sshettar »

Well currently the time taken to complete the job is just about 4 minutes. I did check with my lead and he says the data that this job deals with is very less and is not intending to grow much .

So i was thinking it would be better for now to just leve the job as it is with the 11 join stages and the partitioning being auto itself cause when i changed the partitioning to hash for all the join stages and sorted the data accordsingly and using 2 nodes as compared to 1 node , the job is taking more time then the older version with auto partitioing and 1 node.
the job is taking abt 6 to 7 minutes.

after going through couple of sites i did learn a new thing that by keeping the partitioing to auto for the input links of join stages ( the auto partitioing takes care of the partitioing and also sorting) please corect me if i am wrong .

Do you think that keeping the partitioing to auto and using 2 node would solve the problem for me ?

Thanks in advance
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

If you used auto partitioning over 2 nodes, you would get the same effect as you did with 2 nodes and hash partitioning, so no, this won't solve your 'problem'. This said, having now realised we are talking about a job that doesn't process a high volume, I'm not sure where the need to tune this process comes from. A job that takes 4 or 5 minutes shouldn't really be a focus of efforts I would surmise, unless it is expected to handle volumes which will grow significantly (something you have already ruled out).
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
Post Reply