All
We are looking to do full outer join of two files in DataStage. I can do this using Merge Stage. But reading some of the postings in this forum I think the performance is not good and some people say its ugly !
Can anyone tell me why the performace is not good when we use Merge stage ? If I do this using Aggregator will it be fast ?
I also want to know how can I do a full outer join of more than two files in DataStage without using Merge Stage ?
I appreciate any help !
Thanks
Full outer join of more than two files in DataStage
Moderators: chulett, rschirm, roy
Merge stage is kind of picky. It's really hard for me to say why it behaves in such a way, but I have seen it behave like that. You might want to try it and see what you achieve before you come to any conclusions for yourself. You never know, if might work good for you. I am not sure how you intend to perform a full outer join using Aggregator stage. One way you can achieve is, if any of your files doesn't have duplicates, then you can load of the files in a hashed file and then use as lookup and include all the columns in the output.
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
Performance is a relative thing and depends on many different factors. One man's poor is another man's just fine. It's simple enough to setup the Merge stage - give it a shot and see how it handles your two files on your system.
For multiple files, you could setup a series of Merge stages, I suppose. The first would merge two files, then land the results that so it could be read back in and merged with the next file. Lather, rinse, repeat. I'd probably use named pipes in that case to handle the 'landed' files or at least give that a try first.
The Aggregator can't be used for this.
For multiple files, you could setup a series of Merge stages, I suppose. The first would merge two files, then land the results that so it could be read back in and merged with the next file. Lather, rinse, repeat. I'd probably use named pipes in that case to handle the 'landed' files or at least give that a try first.
The Aggregator can't be used for this.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Merge stage must read its source files. Therefore to join more than two files you need more than one job.
To do it without the Merge stage you would need to load the text files into temporary tables (UV tables would do) and use the database to effect the N-way full outer join. Hashed file lookups do not support full outer joins; only left outer joins.
To do it without the Merge stage you would need to load the text files into temporary tables (UV tables would do) and use the database to effect the N-way full outer join. Hashed file lookups do not support full outer joins; only left outer joins.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.