A parallel job has a join stage that performs a left join based on 4 key columns on the left dataset and right dataset. The join must also include additional columns that have a null value in the left dataset.
Let us call our two datasets D1 and D2 and the key columns as C1,C2,C3,C4.
I am converting SAS logic into DataStage and the SAS code had check for the null within the join logic. When I filter it after the join stage in DS, the DS record count is different from the SAS record count. The record count in SAS is the same with or without the check for null key columns. So, it appears that all columns from the left dataset are selected (the join performs a left join) no matter null/no-null key values but I need to look into this further. But has anyone come across a join based on matching key columns on two tables and null key columns on one of the tables.
If your join key columns are null on one side or the other, then you'll need to look into some flavor of a outer join. An inner join will never return any records where join keys are null.
-craig
"You can never have too many knives" -- Logan Nine Fingers
have you tried using a filter for D1.C3='' and D1.C4='' condition before or after you use the join stage.
In join stage u can have
D1 left join D2 where D1.C1=D2.C1 and D1.C2=D2.C2 condition
chulett wrote:If your join key columns are null on one side or the other, then you'll need to look into some flavor of a outer join. An inner join will never return any records where join keys are null.
I am using a left outer join and sorry that I did not mention it earlier.
chulett wrote:If your join key columns are null on one side or the other, then you'll need to look into some flavor of a outer join. An inner join will never return any records where join keys are null.
I am using a left outer join and sorry that I did not mention it earlier.
You did when you said "left join". I was just emphasizing the point.
The problem is your example showed no join using the C3 & C4 columns. You showed joins between C1 & C2 and (as noted) simple filters on the other 'key' columns. If you want a left outer join between the two tables with all keys then what you should have shown us was this:
D1 inner join D2
when D1.C1 = D2.C1 and
D1.C2 = D2.C2 and
((D1.C3 = D2.C3) or D2.C3 ='') and
((D1.C4 = D2.C3) or D2.C4 ='')
Can this second join be looked as a union of inner join between D1 and D2 based on C1, C2, C3 and C4 and right outer join between C3 and C4?
Any help is appreciated.
pdntsap wrote:Does this mean that the left join based on two keys C1 and C2?
Honestly, I have no idea never having worked with SAS. I've been making some assumptions in this thread, seems to me you'd be better off asking someone you work with who knows what exactly that translates to.
pdntsap also wrote:The SAS code also implements the following on a different set of tables:
<snip>
Can this second join be looked as a union of inner join between D1 and D2 based on C1, C2, C3 and C4 and right outer join between C3 and C4?
Again, unless someone with SAS experience chimes in here, see if a local SAS expert can help you out. Once we know exactly what these statements translate to, it shouldn't be a problem to get to the SQL/DataStage equivalent.
-craig
"You can never have too many knives" -- Logan Nine Fingers