Hi Team,
I have two files as input on Unix file system which has to be joined, should i go for Unix join operator or should i use Data stage join which will have these file as input.
Please assist.
Thanks,
Sachin.
which one is best Unix Join or Datastage Join
Moderators: chulett, rschirm, roy
The join operation in both UNIX and in DataStage is a very simple one which takes two inputs sorted on the join key and does a Group-Change comparison on them.
The UNIX join requires sorted data. If your data is not sorted the you would Need to do that.
If you were to read those files into DataStage you could sort there, which may make a difference on big files when using a parallel configuration with several nodes.
If the files are already sorted, then I'd use an external source stage which calls the UNIX join and outputs straight to DataStage; that way you wouldn't Need to write the join result to disk and then read it in DataStage.
The UNIX join requires sorted data. If your data is not sorted the you would Need to do that.
If you were to read those files into DataStage you could sort there, which may make a difference on big files when using a parallel configuration with several nodes.
If the files are already sorted, then I'd use an external source stage which calls the UNIX join and outputs straight to DataStage; that way you wouldn't Need to write the join result to disk and then read it in DataStage.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>