name-value pairs

suresh_k · Post by **suresh_k** » Tue Dec 04, 2007 3:11 pm

Hi,

I have to perform a lookup by using the column name as the source of the lookup (not the value of the column). Hence, I need to create a name value pair so that I can use the column name. Below is a brief example of what I am trying to accomplish
I have the following format definition and source data
Key, Col1, Col2, Col3
123, ab, cd, ef

I am trying to create the following target dataset
Key, Name, Value
123, Col1, ab
123, Col2, cd
123, Col3, ef

I currently have a way to create the target dataset by concatenating the column names to the end of each input row and then using the pivot to generate the output dataset. The following is what I mean when I say concatenate the column names...
Key, Col1, Col2, Col3, Col4, Col5, Col6
123, ab, cd, ef, Col1, Col2, Col3

In reality I have about 100+ columns on the input dataset and concatenating the 100+ column names as extra columns to each input row is not desired. Is there a better way of doing this? Has anybody tried this before? Your help is much appreciated.

Thanks.
Suresh

rleishman · Post by **rleishman** » Tue Dec 04, 2007 9:03 pm

rleishman · Post by **rleishman** » Tue Dec 04, 2007 9:11 pm

Here's one possibility:

Take your input (currently:)

Code: Select all

Key, Col1, Col2, Col3 
123, ab, cd, ef

and concatenate all of the non-key columns leaving just 2 columns:

Code: Select all

Key, NonKeyCol 
"123", "ab,cd,ef,"

Choose a delimiter that does not appear in the data. I have used comma (,). Make sure there is a trailing delimiter.

Create a column that is the count of delimiters in the string.

Code: Select all

ColCnt = Count(link.nonkeycol, ',')

Pass this into a LOOKUP stage:
- The lookup key is the ColCnt (calculated above)
- The lookup dataset is as follows:

Code: Select all

Colcnt Instance
------ --------
     1        1
     2        1
     2        2
     3        1
     3        2
     3        3
etc.

- In the Conditions tab of the Lookup stage, select the link from "Multiple rows returned from" box.

This has the effect of creating N duplicates of each row, where N is the number of columns in the row.

Now your output stream looks like:

Code: Select all

Key, NonKeyCol, Instance
"123", "ab,cd,ef,", 1
"123", "ab,cd,ef,", 2
"123", "ab,cd,ef,", 3

Now filter out the correct column from each copy:

Code: Select all

Field(
	Link.NonKeyCol 
,	','
,	Link.Instance
)

You can use a similar technique to get the Column name in there if you like the rest of the idea.

suresh_k · Post by **suresh_k** » Wed Dec 05, 2007 2:52 pm

Ross - Thanks for the post. If I were to use your technique, I should create another column that contains the concatenated column names in it instead of having to add one column for each column name (the way I am doing it right now). Let me think about it - because I have to create a lookup dataset that contains 120*120 rows in it (because each row right now has about 120 columns). Thanks again for your time.

Cheers.
Suresh

rleishman wrote:Here's one possibility:

Take your input (currently:)
Code: Select all
Key, Col1, Col2, Col3 
123, ab, cd, ef
and concatenate all of the non-key columns leaving just 2 columns:
Code: Select all
Key, NonKeyCol 
"123", "ab,cd,ef,"
Choose a delimiter that does not appear in the data. I have used comma (,). Make sure there is a trailing delimiter.

Create a column that is the count of delimiters in the string.
Code: Select all
ColCnt = Count(link.nonkeycol, ',')
Pass this into a LOOKUP stage:
- The lookup key is the ColCnt (calculated above)
- The lookup dataset is as follows:
Code: Select all
Colcnt Instance
------ --------
     1        1
     2        1
     2        2
     3        1
     3        2
     3        3
etc.
- In the Conditions tab of the Lookup stage, select the link from "Multiple rows returned from" box.

This has the effect of creating N duplicates of each row, where N is the number of columns in the row.

Now your output stream looks like:
Code: Select all
Key, NonKeyCol, Instance
"123", "ab,cd,ef,", 1
"123", "ab,cd,ef,", 2
"123", "ab,cd,ef,", 3
Now filter out the correct column from each copy:
Code: Select all
Field(
	Link.NonKeyCol 
,	','
,	Link.Instance
)
You can use a similar technique to get the Column name in there if you like the rest of the idea.

rleishman · Post by **rleishman** » Wed Dec 05, 2007 3:39 pm

If you are using Oracle 9i or later, you don't need to use a Lookup dataset - you can generate the number list on the fly:

Code: Select all

SELECT CAST(colcnt AS NUMBER(5)) AS colcnt
,    CAST(LEVEL AS NUMBER(5)) AS colinst
FROM (
    SELECT LEVEL AS colcnt
    FROM DUAL
    CONNECT BY LEVEL <= 120
)
CONNECT BY colcnt = PRIOR colcnt + 1
/

If you parameterise the maximum (120 here), it can easily adapt to larger column sets.