Full outer join of more than two files in DataStage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Full outer join of more than two files in DataStage

Post by kaps »

All

We are looking to do full outer join of two files in DataStage. I can do this using Merge Stage. But reading some of the postings in this forum I think the performance is not good and some people say its ugly !
Can anyone tell me why the performace is not good when we use Merge stage ? If I do this using Aggregator will it be fast ?

I also want to know how can I do a full outer join of more than two files in DataStage without using Merge Stage ?

I appreciate any help !

Thanks
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Merge stage is kind of picky. It's really hard for me to say why it behaves in such a way, but I have seen it behave like that. You might want to try it and see what you achieve before you come to any conclusions for yourself. You never know, if might work good for you. I am not sure how you intend to perform a full outer join using Aggregator stage. One way you can achieve is, if any of your files doesn't have duplicates, then you can load of the files in a hashed file and then use as lookup and include all the columns in the output.
Kris

Where's the "Any" key?-Homer Simpson
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Performance is a relative thing and depends on many different factors. One man's poor is another man's just fine. It's simple enough to setup the Merge stage - give it a shot and see how it handles your two files on your system.

For multiple files, you could setup a series of Merge stages, I suppose. The first would merge two files, then land the results that so it could be read back in and merged with the next file. Lather, rinse, repeat. I'd probably use named pipes in that case to handle the 'landed' files or at least give that a try first.

The Aggregator can't be used for this.
-craig

"You can never have too many knives" -- Logan Nine Fingers
meena
Participant
Posts: 430
Joined: Tue Sep 13, 2005 12:17 pm

Post by meena »

Hi,
Only one way of joining more then two files is to use "merge stage". I never heard of aggregator stage used for this scenario because the aggregator is used to aggregate function/totals etc.And it takes only one input stream( you can not use this stage). Well about the performance It depends..
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Merge stage must read its source files. Therefore to join more than two files you need more than one job.

To do it without the Merge stage you would need to load the text files into temporary tables (UV tables would do) and use the database to effect the N-way full outer join. Hashed file lookups do not support full outer joins; only left outer joins.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply