FULL OUTER JOIN

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nynali
Participant
Posts: 31
Joined: Thu May 03, 2007 11:52 pm
Location: Hyderabad

FULL OUTER JOIN

Post by nynali »

Hi,
I have two files X and Y and my requirement is to join them and I want two outputs from the join.
One will have all the rows from file X.
The other will have rows from file Y which did not get a hit with X.

As of now I am doing a join and then passing them to a transformer and then trying to get the two outputs.So please help me how to put constraints to get the exact outputs as mentioned above in the transformer.
nynali
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Re: FULL OUTER JOIN

Post by gateleys »

I would do it in 2 jobs.
JOB1: File1 is primary input and File2 is reference. The output link should have no constraint. This performs a File1 LEFT OUTER JOIN File2.

JOB2: Toggle the files. So, File2 is primary and File1 (hashed) is reference. In this case, the output link constraint is IsNull(RefLink.KeyCol).

gateleys
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I would use one job, with the Join stage performing a left outer join (with X as the left input). Downstream of the Join stage a Filter or Transformer stage would segregate the input into two streams, one receiving all rows (unconstrained), the other receiving only rows where the key column from Y is null.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rcanaran
Premium Member
Premium Member
Posts: 64
Joined: Wed Jun 14, 2006 3:51 pm
Location: CANADA

Post by rcanaran »

I have a similar situation. I have 3 files. (A, B an C). A is th driver and I do two outer joins in the join stage. The JOIN stage requires that the join columns be identically named. From the JOIN STAGE, STAG tab, PROPERTIES tab,PROPERTIESIn the columns available for target for the key , the help is

"Type: Input Column
Name of input column you want to join on. Columns with the same name must appear in both input data sets and have compatible data types."

In the OUTPUT tab there is only one source column for the join key and I cannot edit the derivation, only drag source to target.
the following transformer stage can only access the output link from the join, so only has ONE key and ONE streamed input. Can it reference the INPUT links of the JOIN stage?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please post your question in a new thread. It is NOT the same, and only vaguely similar, to the question asked - and already resolved - on this thread. There is an answer, but I'll not post it here.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply