Join Issue.

dsadm_ws · Post by **dsadm_ws** » Tue May 31, 2011 9:45 pm

Hi everyone,

I have an issue while I am trying to Inner Join two Datastes. It must be silly, but I am unable to figure it out.

I have a DS file with 8 columns,

AdminsterID(VarChar-Unicode)
AddressLine1(VarChar-Unicode)
AddressLine2(VarChar-Unicode)
City(VarChar-Unicode)
State(VarChar-Unicode)
ZipCode(VarChar-Unicode)
Zip4(VarChar-Unicode)
County(VarChar-Unicode)

I have created another dataset file, for every distinct AdminsterID, populate the Address_SK.

Compare the Address.LN1_ADR = AddressLine1,
Address.LN2_ADR = AddressLine2,
Address.CITY_NM = City,
Address.CNTY_CD = County,
Address.ST_CD = State,
Address.ZIP_5_CD = ZipCode,
Address.ZIP_PLUS_4_CD = Zip4

If you find a match then assign Address.Address_SK to Address_SK else
insert the new records and then assign.

This way populated both the Old_Address_SK & New_Address_SK into a Dataset.

Now trying to join the initial dataset, with the Address_SK dataset by using all the above columns.

Somehow, unable to join and records are getting dropped.

Can't understand what seems to be the problem. Can anyone please, throw some light on it.

Thanks!

Which comes in everyday

SURA · Post by **SURA** » Wed Jun 01, 2011 12:41 am

try without unicode

dsadm_ws · Post by **dsadm_ws** » Wed Jun 01, 2011 7:19 am

How would it differ when I am joining them? I am assigning the unicode to other Address Dataset as well.

dsadm_ws · Post by **dsadm_ws** » Wed Jun 01, 2011 11:57 am

When I try to use the query it yields expected result.

But when I try to ue the same logic in DataStage, it gives me less records.

My design:
Addr_DS

Main_DS--->Sort--->Trnsfmr--->Join--->Peek

Main_DS has the following records: 164 records

AdminsterID(VarChar-Unicode)
AddressLine1(VarChar-Unicode)
AddressLine2(VarChar-Unicode)
City(VarChar-Unicode)
State(VarChar-Unicode)
ZipCode(VarChar-Unicode)
Zip4(VarChar-Unicode)
County(VarChar-Unicode)

Sorting on : 157 records
AddressLine1(VarChar-Unicode)
AddressLine2(VarChar-Unicode)
City(VarChar-Unicode)
State(VarChar-Unicode)
ZipCode(VarChar-Unicode)
Zip4(VarChar-Unicode)
County(VarChar-Unicode)

and also performing Hash partitioning on the sort fields.

Transformer: Same Partitioning 157 records
Applying TrimLeadingTrailing() on each field.

Join: Same Partitioning
Inner join on AddressLine1 ,AddressLine2, City, State, ZipCode, Zip4

It just gives me 78 records instead of 157.

Using the following query: 157 records.

Select C.AddressLine1, C.AddressLine2, C.City, C.State, C.ZipCode, C.Zip4, C.County, A.ADR_SK
From caa C, Address_test A
where C.AddressLine1 = A.LN1_ADR
and C.AddressLine2 = A.LN2_ADR
and C.City = A.CITY_NM
and C.State = A.ST_CD
and C.ZipCode = A.ZIP_5_CD
order by A.ADR_SK

Can anyone tell me what am I missing?? I would appreciate the help. Thanks!

jwiles · Post by **jwiles** » Wed Jun 01, 2011 3:34 pm

Try performing the TrimLeadingTrailing() before the partition/sort instead of after, for both sets of data.

Regards,

dsadm_ws · Post by **dsadm_ws** » Sat Jun 04, 2011 5:03 pm

Hi all, I have tried using the trim() before applying the partioning techniques. However, i get the desired result using a Look-up Stage. Bu, when I use a Join/Merge i am getting all duplicates. Can anyone helpme, as why this is happening.

I would really appriciate any help. Thanks!

jwiles · Post by **jwiles** » Sun Jun 05, 2011 11:16 pm

Comments:

1) Have you noticed that your joins are different? Compare the columns you're joining on for the SQL join to those used in your DS job.
2) You're partitioning on more columns than you're joining on. Don't do that. At most, partition on only the columns you're joining on, but no more. Or, just leave the partitioning set at "Auto" rather than Hash and Same. The engine usually does a good job at selecting the proper partitioning.

Regards,

dsadm_ws · Post by **dsadm_ws** » Mon Jun 06, 2011 9:02 am

Thnak you all for your inputs. I got the desired output. I was trying to join on Null values, which is why the issue.

I have noticed, while working with Join Stage, it does not give you the desired output if you have null values.

DSXchange

Join Issue.

Join Issue.

Re: Join Issue.