Hi All,
in my job design, i have a single input dataset & 3 reference data sets. With each reference data set, i have a separate join based on same 3 keys. Prior to join with input data set, all reference data set go through a remove duplicate stage,which are hash partitioned on same keys as used in join. The output of input data set & first refernce is fetched to second join with 2nd reference data set & so on.
in first join, i used below partioning:
input: Hash(on 3 keys)
1st RD stage Ouput: Same
2nd join
Output from 1st join : Same
2nd RD stage Ouput: Same
3rd join
Output from 2nd join : Same
3rd RD stage Ouput: Same
The issue i am facing is:
In my output, sometimes i get the match from the reference data sets & sometimes not for the same input. Tried disabling "Automatic Partioning " in admin but not fruitful.
What could be cause for this.
Any help would be appreciated.
& 1 more observation:
even if it is getting the match, all the extracted columns from the refernce data sets are not coming(some coming as BLANK) while input is having a valid value in those field.
Sorting at Join proved fruitful.
So, Does it mean that Join tries to match incoming record only with the first record(from reference data set) it founds within the same partition ?
Join data must be sorted on the join key(s). If the data isn't sorted, DataStage will put in a tsort operator unless you explicitly turn this off using an environment parameter such as $APT_NO_SORT_INSERTION.