Incremental Lookup

zaino22 · Post by **zaino22** » Mon May 16, 2011 5:52 pm

I have two source files, one of which will be used as a Lookup file. I did not know what to call it so I named it the way I wish to use it, incremental lookup, but since it restarts from the top, so I know even incremental does not suit it, please let me know what is it called?

Lookup file has several duplicate matching values listed ("00" and "01" in the sample data below are those values) but different corresponding values that we need from Lookup file. Once both source, and Lookup files are sorted based on matching colum/key colum ("00"/"01" colum) , I should be able to match starting from top and pick the corresponding value and move down the list and get its corresponding value from the Lookup file. Once we reach to the end of matching value list in the Lookup file, and there is no more but source file still has some values to match, we should start from the beginning of the Lookup file list for matching values and get its corresponding values, until we run out of matching values in the Source file. It also means we will probably end up with several duplicate corresponding values in the Desired output file. I hope I am able to outline the pic. Here is the sample data, and desired output.

Please note, All data is in Char, and ASCII fixed length file.

Lookup File:
---------------
abc-00
def-00
uvw-01
xyz-01...

Main Source file:
--------------------

123R00
456R00
789R00
111R00
222R00
787R01
488R01
422R01
788R01...

Desire Output
-----------------
123R00abc
456R00def
789R00abc
111R00def
222R00abc
787R01uvw
488R01xyz
422R01uvw
788R01xyz...

ray.wurlod · Post by **ray.wurlod** » Mon May 16, 2011 6:46 pm

Try using a Merge stage, which consumes rows loaded from the Update input. However, your "restart from the top" requirement can not be implemented using this stage type. That would call for some kind of circularity for which, I suspect, you will need a Build stage (or a routine that loads the entire reference set into memory).

zaino22 · Post by **zaino22** » Thu May 19, 2011 9:08 am

Updates:
I am able to do this by just using Join stage (with inner join).
2 files-->Join--->outputfile
I blame this to my communication, and no one else here at the forum, you guys have been eye openers, motivators, and very knowledgable people, who are quick in replying the queries from the hectic scheudle i'm sure you have. So always thankful for your input/help/hint.

For those who are still wondering how it happend: If there are more than one matches in a join condition, it takes all (from Lookup file), and if that happens to be in a sequence, guess what, it will take it in sequence (Job must be developed in Sequential mode, and pre-sort on colum which you want to be selected in sequence) and spit out the record with the colums you want to see in the record.
** Thanks All **

Tejas Pujari · Post by **Tejas Pujari** » Tue May 24, 2011 2:31 am

In look up stage use multiple records from reference option.

In join it will do cross join. Cartesian product.

zaino22 · Post by **zaino22** » Tue May 24, 2011 7:16 am

Thank you Tejas! Will try with the 'Multiple rows returned from the link' option of Lookup stage.