Lookup stage performance Versu Merge stage performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Nagin
Charter Member
Charter Member
Posts: 89
Joined: Thu Jan 26, 2006 12:37 pm

Lookup stage performance Versu Merge stage performance

Post by Nagin »

Hi,
I have job which parses an xml file and looks up against a dataset(table dump) and if the keys are existing it will return the key, if not it will generate a new key and writes to a dataset.

My data volumes are really huge, So, once the lookup dataset got close 3.5GB the job was failing due to lack of temp space. So, I thought if I replace lookup stage with Merge it is going help this situation and with performance of the job as well.

But, I don't see any improvement. In fact the job with Lookup is running 30 seconds faster. This is when the volume is a little above 1 million rows.

Are there any specific parameters I need to enable for Merge to perform faster?

Thanks for your help.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

What is the slow part of your job, the lookup/merge or parsing the XML file? If you have the new XML assembly that can be added to DataStage 8.5 you should get massive XML processing improvements. If you are parsing it using a sequential file stage you could try multiple readers.
Nagin
Charter Member
Charter Member
Posts: 89
Joined: Thu Jan 26, 2006 12:37 pm

Post by Nagin »

vmcburney wrote:What is the slow part of your job, the lookup/merge or parsing the XML file? If you have the new XML assembly that can be added to DataStage 8.5 you should get massive XML processing improvements. If you are parsing it using a sequential file stage you could try multiple readers.
Through put after XML parsing and till the Merge is close to 7000 rows plus per sec, but after Merge it is down to 1100 rows per sec.

Also, We are on 8.1 here , can we get the new XML assembly as a patch to 8.1? Do you know?
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

As far i know XML assembly patch is available only for 8.5

Regards
Sreeni
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Relative performance between Lookup and Merge stages is irrelevant, because they perform different tasks.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Along the lines of different tasks, if Sort and Partition insertion has not been disabled, the engine likely inserted a sort/partition for each input for the merge if they weren't already in the job design. You wouldn't have the sorts inserted for a lookup stage, and typically Entire partitioning on the reference inputs only. For the inserted sorts, your data wouldn't flow out of the sort until the full stream has been sorted and that can affect the rows/sec displayed in monitor/perf statistics (the displayed value is essentially the average since the job began processing data, not an instantaneous value for the stage itself)

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply