Need help in implementing join logic

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Need help in implementing join logic

Post by pdntsap »

Hello,

A parallel job has a join stage that performs a left join based on 4 key columns on the left dataset and right dataset. The join must also include additional columns that have a null value in the left dataset.

Let us call our two datasets D1 and D2 and the key columns as C1,C2,C3,C4.

Code: Select all

D1 left join D2 where D1.C1=D2.C1 and D1.C2=D2.C2 and D1.C3='' and D1.C4=''
The join has to be done this way due to business requirements. Any help in implementing the join logic is greatly appreciated.

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

A join stage does a join and you've included more than just join criteria in your specification. This part is the join:

Code: Select all

D1 left join D2 where D1.C1=D2.C1 and D1.C2=D2.C2
Whereas this part is a filter and something you would need to implement post-join:

Code: Select all

and D1.C3='' and D1.C4=''
Also note that if your values are NULL then checking for an empty string won't work... do an explicit null check.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

Thanks Craig.

I am converting SAS logic into DataStage and the SAS code had check for the null within the join logic. When I filter it after the join stage in DS, the DS record count is different from the SAS record count. The record count in SAS is the same with or without the check for null key columns. So, it appears that all columns from the left dataset are selected (the join performs a left join) no matter null/no-null key values but I need to look into this further. But has anyone come across a join based on matching key columns on two tables and null key columns on one of the tables.

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If your join key columns are null on one side or the other, then you'll need to look into some flavor of a outer join. An inner join will never return any records where join keys are null.
-craig

"You can never have too many knives" -- Logan Nine Fingers
deeplind07
Participant
Posts: 31
Joined: Mon Jun 28, 2010 5:15 am
Location: pune

Post by deeplind07 »

have you tried using a filter for D1.C3='' and D1.C4='' condition before or after you use the join stage.
In join stage u can have
D1 left join D2 where D1.C1=D2.C1 and D1.C2=D2.C2 condition
manoj_23sakthi
Participant
Posts: 47
Joined: Tue Feb 23, 2010 12:16 am
Location: CHENNAI

Post by manoj_23sakthi »

Hi,
Could u please correct my understanding if i am wrong

D1 and D2 datasets (c1,c2,c3,c4)

D1 left outer join D2

If u feel that c3 and c4 are nullable columns before join handle null and populate with default value "-" and join . It will works..

If my understanding is wrong please clarify more...

Thanks
Manoj
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

chulett wrote:If your join key columns are null on one side or the other, then you'll need to look into some flavor of a outer join. An inner join will never return any records where join keys are null.
I am using a left outer join and sorry that I did not mention it earlier.
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

Deeplind - I tried filtering after the join stage and the DS record count was different from SAS record count.

I believe I will implement Manoj's suggestion of nul handling.

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

pdntsap wrote:
chulett wrote:If your join key columns are null on one side or the other, then you'll need to look into some flavor of a outer join. An inner join will never return any records where join keys are null.
I am using a left outer join and sorry that I did not mention it earlier.
You did when you said "left join". I was just emphasizing the point.

The problem is your example showed no join using the C3 & C4 columns. You showed joins between C1 & C2 and (as noted) simple filters on the other 'key' columns. If you want a left outer join between the two tables with all keys then what you should have shown us was this:

Code: Select all

D1 left join D2 where D1.C1=D2.C1 and D1.C2=D2.C2 and D1.C3=D2.C3 and D1.C4=D2.C4
The nulls you listed just become informational... there's no need for any special null handling in that case.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

If you want a left outer join between the two tables with all keys
I believe the the SAS logic does not do left outer join between the two tables with all keys . The SAS code does the following:

Code: Select all

D1 left join D2 
when D1.C1 = D2.C1 and
     D2.C1 = D2.C2 and
     D3.C3 = '' and
     D4.C4 = '' 
Does this mean that the left join based on two keys C1 and C2?


The SAS code also implements the following on a different set of tables:
(again let our tables be D1 and D2 and the key columns be C1, C2, C3, and C4)

Code: Select all

 D1 inner join D2
when D1.C1 = D2.C1 and 
     D1.C2 = D2.C2 and
     ((D1.C3 = D2.C3) or D2.C3 ='') and
     ((D1.C4 = D2.C3) or D2.C4 ='') 
Can this second join be looked as a union of inner join between D1 and D2 based on C1, C2, C3 and C4 and right outer join between C3 and C4?
Any help is appreciated.

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

pdntsap wrote:Does this mean that the left join based on two keys C1 and C2?
Honestly, I have no idea never having worked with SAS. I've been making some assumptions in this thread, seems to me you'd be better off asking someone you work with who knows what exactly that translates to.
pdntsap also wrote:The SAS code also implements the following on a different set of tables:
<snip>
Can this second join be looked as a union of inner join between D1 and D2 based on C1, C2, C3 and C4 and right outer join between C3 and C4?
Again, unless someone with SAS experience chimes in here, see if a local SAS expert can help you out. Once we know exactly what these statements translate to, it shouldn't be a problem to get to the SQL/DataStage equivalent.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pdntsap
Premium Member
Premium Member
Posts: 107
Joined: Mon Jul 04, 2011 5:38 pm

Post by pdntsap »

I will dig deeper into the SAS logic and thanks.
Post Reply