Incremental Lookup

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
zaino22
Premium Member
Premium Member
Posts: 81
Joined: Thu Mar 22, 2007 7:10 pm

Incremental Lookup

Post by zaino22 »

I have two source files, one of which will be used as a Lookup file. I did not know what to call it so I named it the way I wish to use it, incremental lookup, but since it restarts from the top, so I know even incremental does not suit it, please let me know what is it called?

Lookup file has several duplicate matching values listed ("00" and "01" in the sample data below are those values) but different corresponding values that we need from Lookup file. Once both source, and Lookup files are sorted based on matching colum/key colum ("00"/"01" colum) , I should be able to match starting from top and pick the corresponding value and move down the list and get its corresponding value from the Lookup file. Once we reach to the end of matching value list in the Lookup file, and there is no more but source file still has some values to match, we should start from the beginning of the Lookup file list for matching values and get its corresponding values, until we run out of matching values in the Source file. It also means we will probably end up with several duplicate corresponding values in the Desired output file. I hope I am able to outline the pic. Here is the sample data, and desired output.

Please note, All data is in Char, and ASCII fixed length file.

Lookup File:
---------------
abc-00
def-00
uvw-01
xyz-01...

Main Source file:
--------------------

123R00
456R00
789R00
111R00
222R00
787R01
488R01
422R01
788R01...

Desire Output
-----------------
123R00abc
456R00def
789R00abc
111R00def
222R00abc
787R01uvw
488R01xyz
422R01uvw
788R01xyz...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try using a Merge stage, which consumes rows loaded from the Update input. However, your "restart from the top" requirement can not be implemented using this stage type. That would call for some kind of circularity for which, I suspect, you will need a Build stage (or a routine that loads the entire reference set into memory).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
zaino22
Premium Member
Premium Member
Posts: 81
Joined: Thu Mar 22, 2007 7:10 pm

Post by zaino22 »

Updates:
I am able to do this by just using Join stage (with inner join).
2 files-->Join--->outputfile
I blame this to my communication, and no one else here at the forum, you guys have been eye openers, motivators, and very knowledgable people, who are quick in replying the queries from the hectic scheudle i'm sure you have. So always thankful for your input/help/hint.

For those who are still wondering how it happend: If there are more than one matches in a join condition, it takes all (from Lookup file), and if that happens to be in a sequence, guess what, it will take it in sequence (Job must be developed in Sequential mode, and pre-sort on colum which you want to be selected in sequence) and spit out the record with the colums you want to see in the record.
** Thanks All **
Tejas Pujari
Participant
Posts: 14
Joined: Thu Jul 10, 2008 7:37 am
Location: mumbai

Post by Tejas Pujari »

In look up stage use multiple records from reference option.

In join it will do cross join. Cartesian product.
zaino22
Premium Member
Premium Member
Posts: 81
Joined: Thu Mar 22, 2007 7:10 pm

Post by zaino22 »

Thank you Tejas! Will try with the 'Multiple rows returned from the link' option of Lookup stage.
Post Reply