Hi everyone,
I have an issue while I am trying to Inner Join two Datastes. It must be silly, but I am unable to figure it out.
I have a DS file with 8 columns,
AdminsterID(VarChar-Unicode)
AddressLine1(VarChar-Unicode)
AddressLine2(VarChar-Unicode)
City(VarChar-Unicode)
State(VarChar-Unicode)
ZipCode(VarChar-Unicode)
Zip4(VarChar-Unicode)
County(VarChar-Unicode)
I have created another dataset file, for every distinct AdminsterID, populate the Address_SK.
Compare the Address.LN1_ADR = AddressLine1,
Address.LN2_ADR = AddressLine2,
Address.CITY_NM = City,
Address.CNTY_CD = County,
Address.ST_CD = State,
Address.ZIP_5_CD = ZipCode,
Address.ZIP_PLUS_4_CD = Zip4
If you find a match then assign Address.Address_SK to Address_SK else
insert the new records and then assign.
This way populated both the Old_Address_SK & New_Address_SK into a Dataset.
Now trying to join the initial dataset, with the Address_SK dataset by using all the above columns.
Somehow, unable to join and records are getting dropped.
Can't understand what seems to be the problem. Can anyone please, throw some light on it.
Thanks!
Which comes in everyday
Join Issue.
Moderators: chulett, rschirm, roy
Re: Join Issue.
try without unicode
When I try to use the query it yields expected result.
But when I try to ue the same logic in DataStage, it gives me less records.
My design:
Addr_DS
Main_DS--->Sort--->Trnsfmr--->Join--->Peek
Main_DS has the following records: 164 records
AdminsterID(VarChar-Unicode)
AddressLine1(VarChar-Unicode)
AddressLine2(VarChar-Unicode)
City(VarChar-Unicode)
State(VarChar-Unicode)
ZipCode(VarChar-Unicode)
Zip4(VarChar-Unicode)
County(VarChar-Unicode)
Sorting on : 157 records
AddressLine1(VarChar-Unicode)
AddressLine2(VarChar-Unicode)
City(VarChar-Unicode)
State(VarChar-Unicode)
ZipCode(VarChar-Unicode)
Zip4(VarChar-Unicode)
County(VarChar-Unicode)
and also performing Hash partitioning on the sort fields.
Transformer: Same Partitioning 157 records
Applying TrimLeadingTrailing() on each field.
Join: Same Partitioning
Inner join on AddressLine1 ,AddressLine2, City, State, ZipCode, Zip4
It just gives me 78 records instead of 157.
Using the following query: 157 records.
Select C.AddressLine1, C.AddressLine2, C.City, C.State, C.ZipCode, C.Zip4, C.County, A.ADR_SK
From caa C, Address_test A
where C.AddressLine1 = A.LN1_ADR
and C.AddressLine2 = A.LN2_ADR
and C.City = A.CITY_NM
and C.State = A.ST_CD
and C.ZipCode = A.ZIP_5_CD
order by A.ADR_SK
Can anyone tell me what am I missing?? I would appreciate the help. Thanks!
But when I try to ue the same logic in DataStage, it gives me less records.
My design:
Addr_DS
Main_DS--->Sort--->Trnsfmr--->Join--->Peek
Main_DS has the following records: 164 records
AdminsterID(VarChar-Unicode)
AddressLine1(VarChar-Unicode)
AddressLine2(VarChar-Unicode)
City(VarChar-Unicode)
State(VarChar-Unicode)
ZipCode(VarChar-Unicode)
Zip4(VarChar-Unicode)
County(VarChar-Unicode)
Sorting on : 157 records
AddressLine1(VarChar-Unicode)
AddressLine2(VarChar-Unicode)
City(VarChar-Unicode)
State(VarChar-Unicode)
ZipCode(VarChar-Unicode)
Zip4(VarChar-Unicode)
County(VarChar-Unicode)
and also performing Hash partitioning on the sort fields.
Transformer: Same Partitioning 157 records
Applying TrimLeadingTrailing() on each field.
Join: Same Partitioning
Inner join on AddressLine1 ,AddressLine2, City, State, ZipCode, Zip4
It just gives me 78 records instead of 157.
Using the following query: 157 records.
Select C.AddressLine1, C.AddressLine2, C.City, C.State, C.ZipCode, C.Zip4, C.County, A.ADR_SK
From caa C, Address_test A
where C.AddressLine1 = A.LN1_ADR
and C.AddressLine2 = A.LN2_ADR
and C.City = A.CITY_NM
and C.State = A.ST_CD
and C.ZipCode = A.ZIP_5_CD
order by A.ADR_SK
Can anyone tell me what am I missing?? I would appreciate the help. Thanks!
Comments:
1) Have you noticed that your joins are different? Compare the columns you're joining on for the SQL join to those used in your DS job.
2) You're partitioning on more columns than you're joining on. Don't do that. At most, partition on only the columns you're joining on, but no more. Or, just leave the partitioning set at "Auto" rather than Hash and Same. The engine usually does a good job at selecting the proper partitioning.
Regards,
1) Have you noticed that your joins are different? Compare the columns you're joining on for the SQL join to those used in your DS job.
2) You're partitioning on more columns than you're joining on. Don't do that. At most, partition on only the columns you're joining on, but no more. Or, just leave the partitioning set at "Auto" rather than Hash and Same. The engine usually does a good job at selecting the proper partitioning.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.