which one is best Unix Join or Datastage Join

sachin1 · Post by **sachin1** » Sun Mar 11, 2018 6:04 am

Hi Team,

I have two files as input on Unix file system which has to be joined, should i go for Unix join operator or should i use Data stage join which will have these file as input.

Please assist.

Thanks,
Sachin.

chulett · Post by **chulett** » Sun Mar 11, 2018 8:17 am

Seems to me, the typical answer in cases like this is "depends". In your shoes, if I really wanted to answer that question, I would try both. Compare and contrast with your data on your systems, then decide which one to stick with.

ArndW · Post by **ArndW** » Mon Mar 12, 2018 7:56 am

The join operation in both UNIX and in DataStage is a very simple one which takes two inputs sorted on the join key and does a Group-Change comparison on them.

The UNIX join requires sorted data. If your data is not sorted the you would Need to do that.

If you were to read those files into DataStage you could sort there, which may make a difference on big files when using a parallel configuration with several nodes.

If the files are already sorted, then I'd use an external source stage which calls the UNIX join and outputs straight to DataStage; that way you wouldn't Need to write the join result to disk and then read it in DataStage.