Converting multiple columns to single row

anallamothu · Post by **anallamothu** » Thu Mar 08, 2012 6:25 pm

Hello All,

I am developing a parallel job, in which I need to transform multiple columns to a single row.

My source file looks like this

Col1 Col2 Col3 Col4
1 A NULL NULL
2 NULL B NULL
3 NULL NULL C
4 NULL D NULL
5 E NULL NULL
6 NULL NULL F

I should transform that source file to the following Traget file type

Col1 Col2 Col3 Col4
1 A B C
2 E D F

I tried using vertical pivot stage and also the loopvariable in transformer. But it doesn't succeded.

I hope some of you have idea about this one

--
Thanks,
Andy

Kryt0n · Post by **Kryt0n** » Thu Mar 08, 2012 7:35 pm

Effectively you are just aggregating the data but what decides which rows merge in to one?

If purely based on order, make sure they are in the right order... then have a generated column that is common for every 3 rows, then aggregate (e.g. putting nulls first and keeping the max for each column kind of thing)

To try make it clearer
Col1 GenCol Col2 Col3 Col4
1 1 A NULL NULL
2 1 NULL B NULL
3 1 NULL NULL C
4 2 NULL D NULL
5 2 E NULL NULL
6 2 NULL NULL F

qt_ky · Post by **qt_ky** » Thu Mar 08, 2012 10:41 pm

I agree it's not clear what the rules are. If we assume groups of 3 records at a time, it can probably be done using stage variables alone.

kandyshandy · Post by **kandyshandy** » Fri Mar 09, 2012 3:06 am

such a non-realistic requirement?

anallamothu · Post by **anallamothu** » Fri Mar 09, 2012 10:32 am

sorry for not making it clear in my previous post.I made some modifications to the soruce data in order to make it clear

here the col1 data (i.e. 1,2,3..6) is nothing but an incremental sequence and the col2,col3,col4 data are the one's which I need to combine in a single record.

As qt_ky mentioned yes I need to process in groups of 3 records at a time.

And "NULL" describes there isn't any value present in those corresponding columns.

Col1 Col2 Col3 Col4
1 A NULL NULL
2 NULL B NULL
3 NULL NULL C
4 NULL NULL D
5 NULL NULL E
6 F NULL NULL

And the Tagert file is given below

Col1 Col2 Col3 Col4
1 A B C
2 D E F

here is the tricky part, sometimes the source data can be in this format also

Col1 Col2 Col3 Col4
1 NULL NULL A
2 NULL NULL B
3 NULL NULL C
4 NULL NULL D
5 NULL NULL E
6 NULL NULL F

still I need the output to be in the following format only

Col1 Col2 Col3 Col4
1 A B C
2 D E F

Please disregard my previous post.

--
Thanks,
Andy

anallamothu · Post by **anallamothu** » Fri Mar 09, 2012 10:35 am

qt_ky wrote:I agree it's not clear what the rules are. If we assume groups of 3 records at a time, it can probably be done using stage variables alone.

Hi ,

Can you give me a little knowledge of how it can be done using stage variables?

--
Thanks,
Andy

qt_ky · Post by **qt_ky** » Fri Mar 09, 2012 10:48 pm

The clarification helps. A stage variable-only solution is possible here, but it may be considered as "quick and dirty." Something more flexible may involve doing a pivot as a second step. If you get interested in that, search this forum on 'pivot.' We can discuss stage vars for purpose of learning.

Your Col1 of a sequential row ID is easily generated in a Transformer stage using system variable @INROWNUM. To start out, run your stage sequentially or run the parallel job on a single node config file.

Insert a new stage variable (suppose you name it svRow123) with derivation:

Code: Select all

Mod(@INROWNUM - 1, 3) + 1

Put that stage var into a new output column and view the result. Without any constraint, you should get a repeating pattern to work with in your output rows of 1,2,3,1,2,3,... Run it at this point and check.

Append another stage variable with derivation:

Code: Select all

NullToEmpty(lnk.Col2) : NullToEmpty(lnk.Col3) : NullToEmpty(lnk.Col4)

Drag it into a new output column, run it, and view the result. It should give rows like A,B,C,D,E,F,...

Add an output link constraint to only output every 3rd row:

Code: Select all

svRow123 = 3

That is only a start. Take it one step at a time... To complete it in this way will require more stage variables and more understanding of them. I left out that gap for now. Try it and see what ideas and questions you may get.

anallamothu · Post by **anallamothu** » Wed Mar 14, 2012 3:20 pm

Hi qt_ky,

The issue is resolved

I did followed u r approach by concatinating those 3 columns and later I used the pivot stage(vertical pivot).

--
Thanks,
Andy

ray.wurlod · Post by **ray.wurlod** » Wed Mar 14, 2012 3:46 pm

The second person possessive pronoun in English is spelled "your". It is not "u r".

DSXchange is not a mobile phone; there is no requirement to use SMS-style abbreviation. Strive, instead, for a professional standard of written English. Among other things, this makes things easier for those whose first language is not English.

anallamothu · Post by **anallamothu** » Wed Mar 14, 2012 4:38 pm

ray.wurlod wrote:The second person possessive pronoun in English is spelled "your". It is not "u r".

DSXchange is not a mobile phone; there is no requirement to use SMS-style abbreviation. Strive, instead, for a professional standard of written English. Among other things, this makes things easier for those whose first language is not English.

Thanks for pointing it out.

--
Andy