Look up multiple columns

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Look up multiple columns

Post by ThilSe »

Hi,

I have a file(CodeFile) with 3 columns Code1,Code2,Code3.
Eg.
CodeFile
1,2,3
2,3,4

I need to lookup the Description for each column from the same table Code_Desc. The table has 2 fields (Code, Description).

The table has values
Code,Description
1,One
2,Two
3,Three
4,Four


now i need to create a target file by looking up the Code_Desc as follows
1,One,2,Two,3,Three
2,Two,3,Three,4,Four

Currently, I read the data from the table Code_desc using Db2 stage.
Then using Copy stage i am pass 3 reference links in the Look-up stage(One for each field)

Is there any other way of doing this?
What should I do if I need to lookup 17 fields instead of 3 fields (Do i need to have 17 reference links in this case?)?

Thanks in advance
Senthil
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes you would need 17 Lookup stages. Each lookup is based on the key (read "primary key") and can therefore only return only one row.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mpouet
Participant
Posts: 34
Joined: Mon Oct 18, 2004 10:23 am
Location: France

Post by mpouet »

Hi,

May be something better for you would be to replace "," by a end of line (I think it is char(013)). You need first to flag each line individualy with Surrogate_Key_Generator stage, then you obtain a file like this :
Surrogate CodeFile
1 1
1 2
1 3
2 2
2 3
2 4

Instead of

CodeFile
1,2,3
2,3,4

Then you can use a LookUp on a single column.

1 1 One
1 2 Two
1 3 Three
2 2 Two
2 3 Three
2 4 Four

And to finish, use a stage variable to identify the columns who belong from the same line in the input file (same Surrogate) and separate the record with ",".
Be carrefull, I think you must partition by "Surrogate".

That way you can have as many column as you want.
Have fun.
Matthieu.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

mpouet wrote:Hi,

May be something better for you would be to replace "," by a end of line (I think it is char(013)). You need first to flag each line individualy with Surrogate_Key_Generator stage, then you obtain a file like this :
Surrogate CodeFile
1 1
1 2
1 3
2 2
2 3
2 4

Instead of

CodeFile
1,2,3
2,3,4

Then you can use a LookUp on a single column.

1 1 One
1 2 Two
1 3 Three
2 2 Two
2 3 Three
2 4 Four

And to finish, use a stage variable to identify the columns who belong from the same line in the input file (same Surrogate) and separate the record with ",".
Be carrefull, I think you must partition by "Surrogate".

That way you can have as many column as you want.
Have fun.
Matthieu.
Great idea :idea: Matthieu.
In simple, Use a pivot stage to make in single column and then do a lookup. Later do a vertical pivot to get the original structure back... Hmm... may be worth trying.

Pivot is also a resource consuming stage as like Lookup.

Hi senthil,
Better do a test on both the approach and find the best option. Share to use if possible.
If you are not aware of vertical pivot do a serch on the same key word. :wink:

-Kumar
mpouet
Participant
Posts: 34
Joined: Mon Oct 18, 2004 10:23 am
Location: France

Post by mpouet »

Hi,

The two big good points with my method are :
- Your file can be "unstructured". It means you can have a variable number of fields. For example :
1,2,3,4
4,5
8,10,20,21,12,12,14,21
- You don't have to modify your developpment if some fields have to be added. That's a good point isn't it !

For information, we tried once to make a job with 40 LookUp (each with one line). In fact we needed 150 LookUp, but we cut the job in 4 jobs. The performance slowed down dramaticaly : 100 rows/sec.
As only one line was in each (150) LookUp. I managed it with the job parameters and obtained 19000 rows/sec.
Conclusion : avoid too much LookUp...

Bye
Matthieu
r
Participant
Posts: 13
Joined: Tue May 17, 2005 6:16 am

Post by r »

hi,
i am new to datastage.
i was trying out this solution provided by Matthieu & i'm facing a problem. plz help me out.

to convert the transformed file back to its original form, i used 2 stage variables, StgPrev & stgCurr for storing previous & current values of the Surrogate key & the following derivation to get the output in a single column. i used "," to seperate rows having s=different Surrogate key

If StgPrev = StgCurr then DSLink25.code:"|" : DSLink25.code_desc Else "," :DSLink25.code:"|" : DSLink25.code_desc

& in the output file set the record delimiter as "|".
i am getting the following output
1|One|2|Two|3|Three|,2|Two|3|Three|4|Four|

when i use char(013) i get the following output & wc -l gives 0
1|One|2|Two|3|Three|
2|Two|3|Three|4|Four|
How can I specifiy Unix newline so that i may get the correct value of wc -l & the required output.

Thanks & Regards
r
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Instead of using the CHAR(13) {which is <cr>}, use CHAR(10) {which is <lf>}.
r
Participant
Posts: 13
Joined: Tue May 17, 2005 6:16 am

Post by r »

when i use line feed the output is as follows: 1|One|2|Two|3|Three2|Two|3|Three|4|Four|

when i use carriage return, the output is as follows:
1|One|2|Two|3|Three|
{spaces}2|Two|3|Three|4|Four|

the problem still exists.

Regards
r
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

r wrote:when i use line feed the output is as follows: 1|One|2|Two|3|Three2|Two|3|Three|4|Four|

when i use carriage return, the output is as follows:
1|One|2|Two|3|Three|
{spaces}2|Two|3|Three|4|Four|

the problem still exists.

Regards
r
How are you viewing your file? If you put a line feed in there it might not show correctly. If your "wc -l" doesn't show 2 then something went wrong with inserting CHAR(10). I just answered your technical question - but what is your goal and what is the problem?
r
Participant
Posts: 13
Joined: Tue May 17, 2005 6:16 am

Post by r »

hi,

i have to perform multiple lookup. i have a source file of the form:
key
origin country
trans country
destination country
All the 3 country fields contain country codes

The reference file contains country code & country name.
i perform a lookup using the above method. But after the lookup all rows get shuffled up due to partitioning. i am using hash partitioning on the primary input & entire partitioning on the reference. For eg.

Reference file
US United States
IN India
GB Great Britain
JP Japan

Source file:
1|US|GB|IN
2|JP|IN|GB

Using the method described in this post, after the lookup & after sorting on the surr_key & key i may get something like this:
Surr_key|key|origincountry|trans country|dest country
0|1|US|United States
0|1|IN|India
0|1|GB|Great Britain
1|2|IN|India
1|2|GB|Great Britain
1|2|JP|Japan

Here if i perform vertical pivot it will shuffled up values for origin, trans, destination countries.

The only way i get back the original order is using SEQUENTIAL execution mode for the lookup but using defeats the whole purpose of partitioning & fast execution.

Please suggest a soluiton.

Thnaks n Regards
r
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

What is the partiton key used?
What is the order of the vlaues after the pivoting is performed. Write into a file and check if it is in desired order. I hope Pivoting operation is performed in Sequential mode.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply