Need help in implementing join logic

pdntsap · Post by **pdntsap** » Mon Oct 17, 2011 7:31 am

Hello,

A parallel job has a join stage that performs a left join based on 4 key columns on the left dataset and right dataset. The join must also include additional columns that have a null value in the left dataset.

Let us call our two datasets D1 and D2 and the key columns as C1,C2,C3,C4.

Code: Select all

D1 left join D2 where D1.C1=D2.C1 and D1.C2=D2.C2 and D1.C3='' and D1.C4=''

The join has to be done this way due to business requirements. Any help in implementing the join logic is greatly appreciated.

Thanks.

chulett · Post by **chulett** » Mon Oct 17, 2011 7:39 am

A join stage does a join and you've included more than just join criteria in your specification. This part is the join:

Code: Select all

D1 left join D2 where D1.C1=D2.C1 and D1.C2=D2.C2

Whereas this part is a filter and something you would need to implement post-join:

Code: Select all

and D1.C3='' and D1.C4=''

Also note that if your values are NULL then checking for an empty string won't work... do an explicit null check.

pdntsap · Post by **pdntsap** » Mon Oct 17, 2011 4:04 pm

Thanks Craig.

I am converting SAS logic into DataStage and the SAS code had check for the null within the join logic. When I filter it after the join stage in DS, the DS record count is different from the SAS record count. The record count in SAS is the same with or without the check for null key columns. So, it appears that all columns from the left dataset are selected (the join performs a left join) no matter null/no-null key values but I need to look into this further. But has anyone come across a join based on matching key columns on two tables and null key columns on one of the tables.

Thanks.

chulett · Post by **chulett** » Mon Oct 17, 2011 5:19 pm

If your join key columns are null on one side or the other, then you'll need to look into some flavor of a outer join. An inner join will never return any records where join keys are null.

deeplind07 · Post by **deeplind07** » Tue Oct 18, 2011 12:13 am

have you tried using a filter for D1.C3='' and D1.C4='' condition before or after you use the join stage.
In join stage u can have
D1 left join D2 where D1.C1=D2.C1 and D1.C2=D2.C2 condition

manoj_23sakthi · Post by **manoj_23sakthi** » Tue Oct 18, 2011 4:59 am

Hi,
Could u please correct my understanding if i am wrong

D1 and D2 datasets (c1,c2,c3,c4)

D1 left outer join D2

If u feel that c3 and c4 are nullable columns before join handle null and populate with default value "-" and join . It will works..

If my understanding is wrong please clarify more...

Thanks
Manoj

pdntsap · Post by **pdntsap** » Tue Oct 18, 2011 6:38 am

chulett wrote:If your join key columns are null on one side or the other, then you'll need to look into some flavor of a outer join. An inner join will never return any records where join keys are null.

I am using a left outer join and sorry that I did not mention it earlier.

pdntsap · Post by **pdntsap** » Tue Oct 18, 2011 6:41 am

Deeplind - I tried filtering after the join stage and the DS record count was different from SAS record count.

I believe I will implement Manoj's suggestion of nul handling.

Thanks

chulett · Post by **chulett** » Tue Oct 18, 2011 7:05 am

pdntsap wrote:
chulett wrote:If your join key columns are null on one side or the other, then you'll need to look into some flavor of a outer join. An inner join will never return any records where join keys are null.
I am using a left outer join and sorry that I did not mention it earlier.

You did when you said "left join". I was just emphasizing the point.

The problem is your example showed no join using the C3 & C4 columns. You showed joins between C1 & C2 and (as noted) simple filters on the other 'key' columns. If you want a left outer join between the two tables with all keys then what you should have shown us was this:

Code: Select all

D1 left join D2 where D1.C1=D2.C1 and D1.C2=D2.C2 and D1.C3=D2.C3 and D1.C4=D2.C4

The nulls you listed just become informational... there's no need for any special null handling in that case.

pdntsap · Post by **pdntsap** » Tue Oct 18, 2011 10:20 am

If you want a left outer join between the two tables with all keys

I believe the the SAS logic does not do left outer join between the two tables with all keys . The SAS code does the following:

Code: Select all

D1 left join D2 
when D1.C1 = D2.C1 and
     D2.C1 = D2.C2 and
     D3.C3 = '' and
     D4.C4 = ''

Does this mean that the left join based on two keys C1 and C2?

The SAS code also implements the following on a different set of tables:
(again let our tables be D1 and D2 and the key columns be C1, C2, C3, and C4)

Code: Select all

 D1 inner join D2
when D1.C1 = D2.C1 and 
     D1.C2 = D2.C2 and
     ((D1.C3 = D2.C3) or D2.C3 ='') and
     ((D1.C4 = D2.C3) or D2.C4 ='')

Can this second join be looked as a union of inner join between D1 and D2 based on C1, C2, C3 and C4 and right outer join between C3 and C4?
Any help is appreciated.

Thanks.

chulett · Post by **chulett** » Tue Oct 18, 2011 8:50 pm

pdntsap wrote:Does this mean that the left join based on two keys C1 and C2?

Honestly, I have no idea never having worked with SAS. I've been making some assumptions in this thread, seems to me you'd be better off asking someone you work with who knows what exactly that translates to.

pdntsap also wrote:The SAS code also implements the following on a different set of tables:
<snip>
Can this second join be looked as a union of inner join between D1 and D2 based on C1, C2, C3 and C4 and right outer join between C3 and C4?

Again, unless someone with SAS experience chimes in here, see if a local SAS expert can help you out. Once we know exactly what these statements translate to, it shouldn't be a problem to get to the SQL/DataStage equivalent.

pdntsap · Post by **pdntsap** » Wed Oct 19, 2011 1:48 pm

I will dig deeper into the SAS logic and thanks.