Join Issue.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsadm_ws
Premium Member
Premium Member
Posts: 65
Joined: Mon Apr 12, 2010 10:30 am

Join Issue.

Post by dsadm_ws »

Hi everyone,

I have an issue while I am trying to Inner Join two Datastes. It must be silly, but I am unable to figure it out.

I have a DS file with 8 columns,

AdminsterID(VarChar-Unicode)
AddressLine1(VarChar-Unicode)
AddressLine2(VarChar-Unicode)
City(VarChar-Unicode)
State(VarChar-Unicode)
ZipCode(VarChar-Unicode)
Zip4(VarChar-Unicode)
County(VarChar-Unicode)

I have created another dataset file, for every distinct AdminsterID, populate the Address_SK.

Compare the Address.LN1_ADR = AddressLine1,
Address.LN2_ADR = AddressLine2,
Address.CITY_NM = City,
Address.CNTY_CD = County,
Address.ST_CD = State,
Address.ZIP_5_CD = ZipCode,
Address.ZIP_PLUS_4_CD = Zip4

If you find a match then assign Address.Address_SK to Address_SK else
insert the new records and then assign.

This way populated both the Old_Address_SK & New_Address_SK into a Dataset.

Now trying to join the initial dataset, with the Address_SK dataset by using all the above columns.

Somehow, unable to join and records are getting dropped.

Can't understand what seems to be the problem. Can anyone please, throw some light on it.

Thanks!


Which comes in everyday
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Join Issue.

Post by SURA »

try without unicode
dsadm_ws
Premium Member
Premium Member
Posts: 65
Joined: Mon Apr 12, 2010 10:30 am

Post by dsadm_ws »

How would it differ when I am joining them? I am assigning the unicode to other Address Dataset as well.
dsadm_ws
Premium Member
Premium Member
Posts: 65
Joined: Mon Apr 12, 2010 10:30 am

Post by dsadm_ws »

When I try to use the query it yields expected result.

But when I try to ue the same logic in DataStage, it gives me less records.

My design:
Addr_DS

Main_DS--->Sort--->Trnsfmr--->Join--->Peek

Main_DS has the following records: 164 records

AdminsterID(VarChar-Unicode)
AddressLine1(VarChar-Unicode)
AddressLine2(VarChar-Unicode)
City(VarChar-Unicode)
State(VarChar-Unicode)
ZipCode(VarChar-Unicode)
Zip4(VarChar-Unicode)
County(VarChar-Unicode)

Sorting on : 157 records
AddressLine1(VarChar-Unicode)
AddressLine2(VarChar-Unicode)
City(VarChar-Unicode)
State(VarChar-Unicode)
ZipCode(VarChar-Unicode)
Zip4(VarChar-Unicode)
County(VarChar-Unicode)

and also performing Hash partitioning on the sort fields.

Transformer: Same Partitioning 157 records
Applying TrimLeadingTrailing() on each field.

Join: Same Partitioning
Inner join on AddressLine1 ,AddressLine2, City, State, ZipCode, Zip4

It just gives me 78 records instead of 157.


Using the following query: 157 records.

Select C.AddressLine1, C.AddressLine2, C.City, C.State, C.ZipCode, C.Zip4, C.County, A.ADR_SK
From caa C, Address_test A
where C.AddressLine1 = A.LN1_ADR
and C.AddressLine2 = A.LN2_ADR
and C.City = A.CITY_NM
and C.State = A.ST_CD
and C.ZipCode = A.ZIP_5_CD
order by A.ADR_SK


Can anyone tell me what am I missing?? I would appreciate the help. Thanks!
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Try performing the TrimLeadingTrailing() before the partition/sort instead of after, for both sets of data.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
dsadm_ws
Premium Member
Premium Member
Posts: 65
Joined: Mon Apr 12, 2010 10:30 am

Post by dsadm_ws »

Hi all, I have tried using the trim() before applying the partioning techniques. However, i get the desired result using a Look-up Stage. Bu, when I use a Join/Merge i am getting all duplicates. Can anyone helpme, as why this is happening.

I would really appriciate any help. Thanks!
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Comments:

1) Have you noticed that your joins are different? Compare the columns you're joining on for the SQL join to those used in your DS job.
2) You're partitioning on more columns than you're joining on. Don't do that. At most, partition on only the columns you're joining on, but no more. Or, just leave the partitioning set at "Auto" rather than Hash and Same. The engine usually does a good job at selecting the proper partitioning.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
dsadm_ws
Premium Member
Premium Member
Posts: 65
Joined: Mon Apr 12, 2010 10:30 am

Post by dsadm_ws »

Thnak you all for your inputs. I got the desired output. I was trying to join on Null values, which is why the issue.

I have noticed, while working with Join Stage, it does not give you the desired output if you have null values.
Post Reply