Page 1 of 1

Merging multiple rows into single row

Posted: Mon Aug 26, 2013 6:44 pm
by gayathrisivakumar
Hi,'

I have a requirement to merge multiple rows into one row. I tried using stage variables but it seems that logic is going wrong somewhere.

My Input Data
ColA ColB ColC
1 A 100
1 B 200
1 C 300

Desired Output
1,A,100,B,200,C,300

How can I achieve this?

I tried comparing Currkey = PrevKey and appending values to stage variables, but some of the records give desired output and others not.

Can anybody please help.

Posted: Mon Aug 26, 2013 7:21 pm
by asorrell
You are using the correct approach. Did you hash partition and sort on your key? (The column that holds the "1" in the example)

If you are, can you show some example failures or your stage variable logic?

Posted: Mon Aug 26, 2013 8:42 pm
by gayathrisivakumar
Thanks for the quick reply.

My job design is like this:

seqfile ->sort ->transformer->remove duplictes->transformer->seqfile

This logic works fine

when

colA colB colC
1 A 10
1 B 20
2 C 30
2 D 40

Output comes as 1,A,10,B,20
2,c,30,D,40

But it fails when
colA colB colC
1 A 10
1 B 20
2 C 30
2 D 40
3 E 50

Output is coming as 1,A,10,B,20
2,C,30,D,40
3,A,C,10,B,20,E,50.

I am doing hash partitioning based on colA.Inside the stage variable if CurrKey = PrevKey then concatenate input columns into the stage variable.

Please let me know where am I going wrong.

Regards,
Gayathri

Posted: Tue Aug 27, 2013 1:05 am
by srinivas.nettalam
Just to make sure nothing wrong in your derivation.Enable KeyChange Column in sort and write your derivation like

Code: Select all

svCurrent = If KeyChange=1 Then ColB:',':ColC Else svPrev:', ':ColB:',':ColC
svPrev=svCurrent

Posted: Tue Aug 27, 2013 7:01 am
by gayathrisivakumar
Yes I am doing the same derivation. The logic goes wrong from the third change in key. Anybody has any insight?

Posted: Tue Aug 27, 2013 7:08 am
by chulett
If you run this on a single node, does it run correctly?

Posted: Tue Aug 27, 2013 7:59 am
by gayathrisivakumar
when I try to run it on single node the results are completely different.

1,A,C,10,B,20,C,30,D,40,E,50.

Posted: Tue Aug 27, 2013 9:23 am
by boxtoby
When I have done this sort of thing in the past I have had success but doing the following:

Sort the data by colA,hash partition by colA and set the key change indicator on.

I use three stage vars as follows:

svLastRowInGroup:
LastRowInGroup(colA)

(LastRowInGroup() function is set to 1 for true, zero for false)

svDataOut
if svLastRowInGroup and keyChange then colB:",":colC else if svLastRowInGroup then svDataOut:",":colB:",":colC else ""

(This logic checks single row records as well)

svData
if keyChange then colB:",":colC else svData:",":colB:",":colC

The transformer has a constraint of svLastRowInGroup

Posted: Tue Aug 27, 2013 9:23 am
by chulett
gayathrisivakumar wrote:when I try to run it on single node the results are completely different.
Then your logic is flawed. You'd need to supply the details of your job design to allow anyone to help without continuing to guess at this point.

What field(s) did you specify as the change keys in the Sort stage? Is the KeyChange column being set correctly? Post your exact stage variables and derivations and the output link constraint you are using.

ps. You don't need "current" and "prev" variables when the Sort stage is setting the KeyChange column.

Posted: Tue Aug 27, 2013 10:51 am
by chulett
Was going to get to the 'LastRowInGroup' part as something needs to push out the last 'group' when taking this approach but wanted to check the version you are running as it was included starting with 8.5.

Posted: Wed Aug 28, 2013 2:04 am
by boxtoby
ChangeKey would be colA

I wouldn't have thought the logic was that difficult to follow.....

Posted: Wed Aug 28, 2013 4:35 pm
by gayathrisivakumar
Thanks all for your help.

Craig I am using Datastage 8.7. I couldn't read your whole post as I am yet to renew my membership.

I rebuilt the whole job again based on your suggestions and now it is working. :)

Since I was busy I dint get time to debug my existing code.