name-value pairs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
suresh_k
Premium Member
Premium Member
Posts: 20
Joined: Thu Aug 09, 2007 9:51 am

name-value pairs

Post by suresh_k »

Hi,

I have to perform a lookup by using the column name as the source of the lookup (not the value of the column). Hence, I need to create a name value pair so that I can use the column name. Below is a brief example of what I am trying to accomplish
I have the following format definition and source data
Key, Col1, Col2, Col3
123, ab, cd, ef

I am trying to create the following target dataset
Key, Name, Value
123, Col1, ab
123, Col2, cd
123, Col3, ef

I currently have a way to create the target dataset by concatenating the column names to the end of each input row and then using the pivot to generate the output dataset. The following is what I mean when I say concatenate the column names...
Key, Col1, Col2, Col3, Col4, Col5, Col6
123, ab, cd, ef, Col1, Col2, Col3

In reality I have about 100+ columns on the input dataset and concatenating the 100+ column names as extra columns to each input row is not desired. Is there a better way of doing this? Has anybody tried this before? Your help is much appreciated.

Thanks.
Suresh
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

<woops - see next post>
Last edited by rleishman on Tue Dec 04, 2007 9:12 pm, edited 1 time in total.
Ross Leishman
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

Here's one possibility:

Take your input (currently:)

Code: Select all

Key, Col1, Col2, Col3 
123, ab, cd, ef
and concatenate all of the non-key columns leaving just 2 columns:

Code: Select all

Key, NonKeyCol 
"123", "ab,cd,ef,"
Choose a delimiter that does not appear in the data. I have used comma (,). Make sure there is a trailing delimiter.

Create a column that is the count of delimiters in the string.

Code: Select all

ColCnt = Count(link.nonkeycol, ',')
Pass this into a LOOKUP stage:
- The lookup key is the ColCnt (calculated above)
- The lookup dataset is as follows:

Code: Select all

Colcnt Instance
------ --------
     1        1
     2        1
     2        2
     3        1
     3        2
     3        3
etc.
- In the Conditions tab of the Lookup stage, select the link from "Multiple rows returned from" box.

This has the effect of creating N duplicates of each row, where N is the number of columns in the row.

Now your output stream looks like:

Code: Select all

Key, NonKeyCol, Instance
"123", "ab,cd,ef,", 1
"123", "ab,cd,ef,", 2
"123", "ab,cd,ef,", 3
Now filter out the correct column from each copy:

Code: Select all

Field(
	Link.NonKeyCol 
,	','
,	Link.Instance
)
You can use a similar technique to get the Column name in there if you like the rest of the idea.
Ross Leishman
suresh_k
Premium Member
Premium Member
Posts: 20
Joined: Thu Aug 09, 2007 9:51 am

Post by suresh_k »

Ross - Thanks for the post. If I were to use your technique, I should create another column that contains the concatenated column names in it instead of having to add one column for each column name (the way I am doing it right now). Let me think about it - because I have to create a lookup dataset that contains 120*120 rows in it (because each row right now has about 120 columns). Thanks again for your time.

Cheers.
Suresh
rleishman wrote:Here's one possibility:

Take your input (currently:)

Code: Select all

Key, Col1, Col2, Col3 
123, ab, cd, ef
and concatenate all of the non-key columns leaving just 2 columns:

Code: Select all

Key, NonKeyCol 
"123", "ab,cd,ef,"
Choose a delimiter that does not appear in the data. I have used comma (,). Make sure there is a trailing delimiter.

Create a column that is the count of delimiters in the string.

Code: Select all

ColCnt = Count(link.nonkeycol, ',')
Pass this into a LOOKUP stage:
- The lookup key is the ColCnt (calculated above)
- The lookup dataset is as follows:

Code: Select all

Colcnt Instance
------ --------
     1        1
     2        1
     2        2
     3        1
     3        2
     3        3
etc.
- In the Conditions tab of the Lookup stage, select the link from "Multiple rows returned from" box.

This has the effect of creating N duplicates of each row, where N is the number of columns in the row.

Now your output stream looks like:

Code: Select all

Key, NonKeyCol, Instance
"123", "ab,cd,ef,", 1
"123", "ab,cd,ef,", 2
"123", "ab,cd,ef,", 3
Now filter out the correct column from each copy:

Code: Select all

Field(
	Link.NonKeyCol 
,	','
,	Link.Instance
)
You can use a similar technique to get the Column name in there if you like the rest of the idea.
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

If you are using Oracle 9i or later, you don't need to use a Lookup dataset - you can generate the number list on the fly:

Code: Select all

SELECT CAST(colcnt AS NUMBER(5)) AS colcnt
,    CAST(LEVEL AS NUMBER(5)) AS colinst
FROM (
    SELECT LEVEL AS colcnt
    FROM DUAL
    CONNECT BY LEVEL <= 120
)
CONNECT BY colcnt = PRIOR colcnt + 1
/
If you parameterise the maximum (120 here), it can easily adapt to larger column sets.
Ross Leishman
Post Reply