Duplicate Lookup Values

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
parag.s.27
Participant
Posts: 221
Joined: Fri Feb 17, 2006 3:38 am
Location: India
Contact:

Duplicate Lookup Values

Post by parag.s.27 »

Hey all,

I am having a lookup file which is fetched from an ERP system at the run time and is directly fed to the hash file and is used as a lookup.

Th problem is there are sometimes duplicate lookup parameters in the lookup file for a single value to be searched.

for e.g. in my master file "CC1464-SERV" is a value for which i have to find a lookup...now in lookup file

i have

1.)CC1464-SERV, WBSE.001.00012534
2.)CC1464-SERV, CCE.001.001001

now since thr are duplicate lookup parameters, the DS picks random value... sometimes 1 and other times 2.

I can not go for searching and removing duplicate values from lookup file coz it comes at runtime and has thousands of values.

can someone suggest any logic to always pick the first value or remove duplicates at the run time itself..

thanks in advance
Thanks & Regards
Parag Saundattikar
Certified for Infosphere DataStage v8.0
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Re: Duplicate Lookup Values

Post by balajisr »

parag.s.27 wrote:Hey all,

I am having a lookup file which is fetched from an ERP system at the run time and is directly fed to the hash file and is used as a lookup.

Th problem is there are sometimes duplicate lookup parameters in the lookup file for a single value to be searched.

for e.g. in my master file "CC1464-SERV" is a value for which i have to find a lookup...now in lookup file

i have

1.)CC1464-SERV, WBSE.001.00012534
2.)CC1464-SERV, CCE.001.001001

now since thr are duplicate lookup parameters, the DS picks random value... sometimes 1 and other times 2.

I can not go for searching and removing duplicate values from lookup file coz it comes at runtime and has thousands of values.

can someone suggest any logic to always pick the first value or remove duplicates at the run time itself..

thanks in advance
If you want to remove duplicates which one will be retained? First or Last?.
If the last value is to be retained then write it to the hashed file.
When writing the hashed file from the lookup table duplicate rows are removed.Writing the hashed file with the same key causes the row to be overwritten and the last value for the same key from the lookup file is written to the hashed file. First value cannot be retreived.

Or preprocess the file using unix uniq command in before job subroutine or in filter option of sequential file.

If possible change your key.
parag.s.27
Participant
Posts: 221
Joined: Fri Feb 17, 2006 3:38 am
Location: India
Contact:

Post by parag.s.27 »

Hi,

Actually i mentioned in my post that i want to retain 1st value...
Thanks & Regards
Parag Saundattikar
Certified for Infosphere DataStage v8.0
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

parag.s.27 wrote:Hi,

Actually i mentioned in my post that i want to retain 1st value...
In that case, pass the lookup file though a sort and then a tranform. In the transform compare the current row key value to the previous row keyvalue. If the current row key equals the previous row key, don't pass the record on.

Lookup RowProcCompareWithPreviousValue() in help. Note that you can only use this function once in a job.

Rob W
Rob Wierdsma
Toronto, Canada
bartonbishop.com
parag.s.27
Participant
Posts: 221
Joined: Fri Feb 17, 2006 3:38 am
Location: India
Contact:

Post by parag.s.27 »

Ya i think i should make a seperate job and include in the sequence......
else thr is no option it seems....
Thanks & Regards
Parag Saundattikar
Certified for Infosphere DataStage v8.0
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Better yet, reverse sort it and pass it through the hashed file. You dont need to make your design more complicated just to get a last record in your hashed file.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

parag.s.27 wrote:Hi,

Actually i mentioned in my post that i want to retain 1st value...
Balaji also gave a solution which was unnoticed.
preprocess the file using unix uniq command in before job subroutine or in filter option of sequential file
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Make sure the hashed file is loaded with the first value (however that is defined). Usually, as others have noted, this can be accomplished by sorting the stream of data being loaded into the hashed file in reverse order; hashed file records are destructuvely overwritten, so it is the last written version of any record that remains in the hashed file. Sorting in reverse order guarantees that this is your "first".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply