Page 1 of 1

Look up multiple columns

Posted: Wed Feb 08, 2006 2:44 am
by ThilSe
Hi,

I have a file(CodeFile) with 3 columns Code1,Code2,Code3.
Eg.
CodeFile
1,2,3
2,3,4

I need to lookup the Description for each column from the same table Code_Desc. The table has 2 fields (Code, Description).

The table has values
Code,Description
1,One
2,Two
3,Three
4,Four


now i need to create a target file by looking up the Code_Desc as follows
1,One,2,Two,3,Three
2,Two,3,Three,4,Four

Currently, I read the data from the table Code_desc using Db2 stage.
Then using Copy stage i am pass 3 reference links in the Look-up stage(One for each field)

Is there any other way of doing this?
What should I do if I need to lookup 17 fields instead of 3 fields (Do i need to have 17 reference links in this case?)?

Thanks in advance
Senthil

Posted: Wed Feb 08, 2006 3:31 am
by ray.wurlod
Yes you would need 17 Lookup stages. Each lookup is based on the key (read "primary key") and can therefore only return only one row.

Posted: Wed Feb 08, 2006 4:35 am
by mpouet
Hi,

May be something better for you would be to replace "," by a end of line (I think it is char(013)). You need first to flag each line individualy with Surrogate_Key_Generator stage, then you obtain a file like this :
Surrogate CodeFile
1 1
1 2
1 3
2 2
2 3
2 4

Instead of

CodeFile
1,2,3
2,3,4

Then you can use a LookUp on a single column.

1 1 One
1 2 Two
1 3 Three
2 2 Two
2 3 Three
2 4 Four

And to finish, use a stage variable to identify the columns who belong from the same line in the input file (same Surrogate) and separate the record with ",".
Be carrefull, I think you must partition by "Surrogate".

That way you can have as many column as you want.
Have fun.
Matthieu.

Posted: Wed Feb 08, 2006 5:15 am
by kumar_s
mpouet wrote:Hi,

May be something better for you would be to replace "," by a end of line (I think it is char(013)). You need first to flag each line individualy with Surrogate_Key_Generator stage, then you obtain a file like this :
Surrogate CodeFile
1 1
1 2
1 3
2 2
2 3
2 4

Instead of

CodeFile
1,2,3
2,3,4

Then you can use a LookUp on a single column.

1 1 One
1 2 Two
1 3 Three
2 2 Two
2 3 Three
2 4 Four

And to finish, use a stage variable to identify the columns who belong from the same line in the input file (same Surrogate) and separate the record with ",".
Be carrefull, I think you must partition by "Surrogate".

That way you can have as many column as you want.
Have fun.
Matthieu.
Great idea :idea: Matthieu.
In simple, Use a pivot stage to make in single column and then do a lookup. Later do a vertical pivot to get the original structure back... Hmm... may be worth trying.

Pivot is also a resource consuming stage as like Lookup.

Hi senthil,
Better do a test on both the approach and find the best option. Share to use if possible.
If you are not aware of vertical pivot do a serch on the same key word. :wink:

-Kumar

Posted: Thu Feb 09, 2006 1:59 am
by mpouet
Hi,

The two big good points with my method are :
- Your file can be "unstructured". It means you can have a variable number of fields. For example :
1,2,3,4
4,5
8,10,20,21,12,12,14,21
- You don't have to modify your developpment if some fields have to be added. That's a good point isn't it !

For information, we tried once to make a job with 40 LookUp (each with one line). In fact we needed 150 LookUp, but we cut the job in 4 jobs. The performance slowed down dramaticaly : 100 rows/sec.
As only one line was in each (150) LookUp. I managed it with the job parameters and obtained 19000 rows/sec.
Conclusion : avoid too much LookUp...

Bye
Matthieu

Posted: Mon Feb 13, 2006 6:56 am
by r
hi,
i am new to datastage.
i was trying out this solution provided by Matthieu & i'm facing a problem. plz help me out.

to convert the transformed file back to its original form, i used 2 stage variables, StgPrev & stgCurr for storing previous & current values of the Surrogate key & the following derivation to get the output in a single column. i used "," to seperate rows having s=different Surrogate key

If StgPrev = StgCurr then DSLink25.code:"|" : DSLink25.code_desc Else "," :DSLink25.code:"|" : DSLink25.code_desc

& in the output file set the record delimiter as "|".
i am getting the following output
1|One|2|Two|3|Three|,2|Two|3|Three|4|Four|

when i use char(013) i get the following output & wc -l gives 0
1|One|2|Two|3|Three|
2|Two|3|Three|4|Four|
How can I specifiy Unix newline so that i may get the correct value of wc -l & the required output.

Thanks & Regards
r

Posted: Mon Feb 13, 2006 7:16 am
by ArndW
Instead of using the CHAR(13) {which is <cr>}, use CHAR(10) {which is <lf>}.

Posted: Tue Feb 14, 2006 6:12 am
by r
when i use line feed the output is as follows: 1|One|2|Two|3|Three2|Two|3|Three|4|Four|

when i use carriage return, the output is as follows:
1|One|2|Two|3|Three|
{spaces}2|Two|3|Three|4|Four|

the problem still exists.

Regards
r

Posted: Tue Feb 14, 2006 7:04 am
by ArndW
r wrote:when i use line feed the output is as follows: 1|One|2|Two|3|Three2|Two|3|Three|4|Four|

when i use carriage return, the output is as follows:
1|One|2|Two|3|Three|
{spaces}2|Two|3|Three|4|Four|

the problem still exists.

Regards
r
How are you viewing your file? If you put a line feed in there it might not show correctly. If your "wc -l" doesn't show 2 then something went wrong with inserting CHAR(10). I just answered your technical question - but what is your goal and what is the problem?

Posted: Wed Feb 22, 2006 3:38 am
by r
hi,

i have to perform multiple lookup. i have a source file of the form:
key
origin country
trans country
destination country
All the 3 country fields contain country codes

The reference file contains country code & country name.
i perform a lookup using the above method. But after the lookup all rows get shuffled up due to partitioning. i am using hash partitioning on the primary input & entire partitioning on the reference. For eg.

Reference file
US United States
IN India
GB Great Britain
JP Japan

Source file:
1|US|GB|IN
2|JP|IN|GB

Using the method described in this post, after the lookup & after sorting on the surr_key & key i may get something like this:
Surr_key|key|origincountry|trans country|dest country
0|1|US|United States
0|1|IN|India
0|1|GB|Great Britain
1|2|IN|India
1|2|GB|Great Britain
1|2|JP|Japan

Here if i perform vertical pivot it will shuffled up values for origin, trans, destination countries.

The only way i get back the original order is using SEQUENTIAL execution mode for the lookup but using defeats the whole purpose of partitioning & fast execution.

Please suggest a soluiton.

Thnaks n Regards
r

Posted: Wed Feb 22, 2006 12:20 pm
by kumar_s
What is the partiton key used?
What is the order of the vlaues after the pivoting is performed. Write into a file and check if it is in desired order. I hope Pivoting operation is performed in Sequential mode.