I have the following scanario and trying to get the Target dataset as below: Prmiary link: db2 stg -> remove dup stg -> Tx -> Join (L-outer) -> Target Dataset Reference Link: db2 stg -> remove dup stg -> Tx -> Join
Primary Dataset has following data:
A B
--- ---
2 K
1 J
3 L
Reference Dataset:
A C
--- ---
1 33
1 78
2 65
1 44
3 21
2 15
Target Dataset: (required)
A B C
- -- --
1 J 33
1 J 78
1 J 44
2 K 65
2 K 15
3 L 21
Why did you remove the duplicates from the Reference stream if the below results are required? Try to get rid of the remove duplicate stage from the reference link or both.
In my actual job i was using the Remove dup stage using A and C as key because there were duplicate rows for A and C. (to better understand see below now):
How could you expect to get one-to-many if you remove duplicates on both input links to a Join? You could only ever get one-to-one.
It is perfectly OK to have duplicates on the inputs to a Join - or only on one input, if that's your requirement. Try discarding the Remove Duplicates stage on the right input to see what I mean.
Lookup stage is not really geared to returning multiple rows unless you have pre-loaded a Lookup File Set with duplicates permitted, and use that to provide rows for the reference input. Or unless you use a sparse lookup.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.