lookup with duplicate values in key column

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
eze_ke
Participant
Posts: 38
Joined: Wed Mar 30, 2005 11:42 am

lookup with duplicate values in key column

Post by eze_ke »

Hi

I have a source file In which there are four key columns which is going to do a lookup on lookup table.
every indvidual looup key has duplicate values in the table.
This four keys are in hiearchy
Source
Transaction_type
Transaction_sub_type
Asset_id
If asset Id is populated then there should be value in all the above keys (source, transaction_type and transaction_sub_type)
simlarly for transaction_sub_type has value then Source and transaction should have value.

I'm trying to find the solution to load the data.
But when I do lookup I think it is picking only the first record from the table.
Here I want all the records with that key even though it has duplicates.
If key has 500 duplicate values I want to read all the 500 records.

Some one please help me in resolving this Problem

Thanks in Advance
Eze_ke
kris
Participant
Posts: 160
Joined: Tue Dec 09, 2003 2:45 pm
Location: virginia, usa

Re: lookup with duplicate values in key column

Post by kris »

Hi Eze_ke,

Why don't you go for join instead of lookup ?


Kris~
~Kris
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: lookup with duplicate values in key column

Post by kwwilliams »

Eze_Ke,

Tell me if I understand your problem correctly.

You have a source file with 4 key columns in which some of the key columns are not populated. Say the column names are A,B,C, and D. If D is not null then A,B,and C would be populated as well. If D is null and C is not then A and B would not be null as well.

This means that 1) your keys are truly dynamic and you will want to perform a join based upon the information present in the data. Or 2) your table that you are performing a lookup would also have null values when there are null values in the key for your source.

Is 1 or 2 true? I think the answer will yield some insight for us to provide to you.
eze_ke
Participant
Posts: 38
Joined: Wed Mar 30, 2005 11:42 am

Post by eze_ke »

Hi Keith,
My keys are dynamic and they have duplicate values.
Since there are duplicates in lookup table I'm not sure I'm picking all the duplicate records.
In my case I have read all the duplicate records in the table

In my job when I'm doing lookup It is picking up the first duplicate record.
So I want to know is there any way to read all the duplicate records from lookup.


Thanks
Eze_ke
reachthiru
Participant
Posts: 28
Joined: Mon Jan 09, 2006 1:31 pm

Post by reachthiru »

Hi Eze_ke,

As pointed out by Kris, in this case you have to go for join only. Lookup will return only the first matching record. So, you can't get all the duplicate records with Lookup.
With regards,
Thiru
eze_ke
Participant
Posts: 38
Joined: Wed Mar 30, 2005 11:42 am

Post by eze_ke »

Thanks reachthiru!

I have tested with join stage too.
Still I'm not confident as we are having duplicates in the table.
I need to read each and every duplicate record.
Since my individual keys are in hierarchy Which join method I need to use




Thanks
Eze_ke
reachthiru
Participant
Posts: 28
Joined: Mon Jan 09, 2006 1:31 pm

Post by reachthiru »

Hi,

I assume the following would work. But I am not very much confident about it.

Assume that your table names are M (main table), A (Asset ID), B (Transaction_Sub_type), C (Transaction Type) and D (Source).

What you can do is join M & A first, then join B with result, then C and finally D.

Hope it will work.
With regards,
Thiru
eze_ke
Participant
Posts: 38
Joined: Wed Mar 30, 2005 11:42 am

Post by eze_ke »

Hi reachthiru
I Implemented the join logic too. It didn't worked as there are duplicates in the key values.

What joing I need to use to read all the matching records from source even though the are duplicates values for the Key columns.


Thanks
Eze_ke
jtwilson
Participant
Posts: 9
Joined: Mon Sep 12, 2005 3:50 pm

Post by jtwilson »

Eze_ke,

Have you tried specifying "Multiple rows returned from link:" in the options tab of the Lookup stage? This option should allow the lookup to return the Cartesian product that's created when you have a one-to-many relationship between the primary and lookup data sets.

John
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

eze_ke wrote:Hi Keith,
My keys are dynamic and they have duplicate values.
Since there are duplicates in lookup table I'm not sure I'm picking all the duplicate records.
In my case I have read all the duplicate records in the table

In my job when I'm doing lookup It is picking up the first duplicate record.
So I want to know is there any way to read all the duplicate records from lookup.


Thanks
Eze_ke
How do you have one lookup with a dynamic key? Is the key truly dynamic or do you allow nulls in your key?? Does your lookup table have nulls in the keys as well? If nulls are in both then your key is not dynamic, but it does allow null values. Still trying to understand your problem. Looking at the postings for solutions you have attempted it appears that we may not understand fully your logic. A join would create a cartesian product if there were multiple keys there.
ceenu
Participant
Posts: 7
Joined: Fri Jan 27, 2006 10:44 pm

Re: lookup with duplicate values in key column

Post by ceenu »

Hi All,

Couple of clients asking Same question in the interview in case of Hashfile stage.
Please let me know it will load all duplicates or only first record.

Thanks
ceenu
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: lookup with duplicate values in key column

Post by kwwilliams »

ceenu wrote: Couple of clients asking Same question in the interview in case of Hashfile stage.
Thanks
ceenu
Hash files are server job and your are posting in the parallel job forum. To answer your question a hash file will eliminate duplicates. It is one of the most common ways to eliminate duplicates in a server job.
ceenu
Participant
Posts: 7
Joined: Fri Jan 27, 2006 10:44 pm

Re: lookup with duplicate values in key column

Post by ceenu »

Thanks,sorry about posting in parallel.
You mean it will pickup only first record

thanks
ceenu
sylvan_rydes
Participant
Posts: 47
Joined: Wed Apr 12, 2006 12:13 pm

Post by sylvan_rydes »

Hi All,

can anyone suggest the easiest way to get distinct value in a column.

Thanks,
Sylvan
sylvan rydes
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Extract using SQL (select DISTINCT column...).
Aggregator stage, selecting First or Last of grouping column.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply