Page 1 of 1

Reverse Pivot

Posted: Thu Jun 07, 2007 9:15 am
by UPS
Newbie question :?: - Can anybody tell me how to accomplish what the PIVOT stage does but in reverse.
Input:
rec1: key1, fld1
rec2: key1, fld1
rec3: key1, fld1
rec4: key2, fld1
rec5: key2, fld1
...
OutPut:
rec1: key1,rec1.fld1,rec2.fld1,rec3.fld1
rec2: key2,rec4.fld1,rec5.fld1,,
...

The process should be able to handle some decision making like what output bucket to populate on certain criteria.

Any help, in detail, would be VERY appreciated

Posted: Thu Jun 07, 2007 12:50 pm
by Larry.Griffith
The databases I work with are deep not wide, so I do this a lot.

Doing this in a parallel manner with a generic routine is difficult. I have done this with a C routine plug in. The trick is to get all the columns for the same record into the same partition. I do this by hash/sort partitioning on the key field (and only the key field) into the plugin.

Some of the things you need to know when doing this are

- keyid
- output order (unless you have a row for every column and you can garantee they are in the correct order)
- format (TEXT, VALUE, NUMBER, DATE) (If you need to use quotes, or need to format the data )
- column value

Good luck[/list]

Posted: Thu Jun 07, 2007 3:22 pm
by ArndW
If your input data is sorted by key and fld1 then you can use the "classical" approach as done in Server jobs. Put in a transform stage and use stage variables to store your interim output columns and last column data data, set the constraint to only output rows when the key-fld1 changes and output the collected list of key and fld1 combinations.

Posted: Fri Jun 08, 2007 10:06 am
by UPS
Hi Arndw
You will know the key changed only when you read the row where the key changed so ideally
rec1: key1, fld1
rec2: key1, fld1
rec3: key1, fld1
rec4: key2, fld1
rec5: key2, fld1

Only when i read in rec4 Key2,fld1 i will know the key has changed so if you want me to write out what i have been holding till then in the stage variables in to the output link it will not work for the last record because you will never know when to write it.

i am trying with a couple of sort stages logic to find the key change the claasical approach. I have an other idea of writing everything out and then using the remove duplicates to retain the last row which will have the most consolidated information.

Thanks
Pavan

Posted: Fri Jun 08, 2007 10:35 am
by DSguru2B
Why dont you think it will work again? Keep apending the values to the stage variable untill a key change is encountered. In that case reset the stage variable and start appending untill the next key change is encountered. Do this till the last record. Then pass the output to remove duplicate stage and retain the last record for each id.

RE: Reverse Pivoting

Posted: Fri Jun 08, 2007 1:22 pm
by UPS
Hi Arndw and DSguru2B,
I understand your concept and I have done something very similar. I am looking for a Parallel solution but I will definitely keep this in mind.
I do not know the other person who posted under UPS. I am the original one who posted as UPS.

Thanks for your help.
John - jalden@ups.com

Posted: Fri Jun 08, 2007 1:33 pm
by DSguru2B
This will work in a parallel job as well. Just sort on the key and hash partition so that all identical keys end up in one partition.

RE: Reverse Pivoting

Posted: Fri Jun 08, 2007 1:59 pm
by UPS
DSguru2B,
What stage would I use then? Would I use a C routine as someone previously suggested?

Regards -
John

Posted: Fri Jun 08, 2007 2:22 pm
by DSguru2B
Naa, I would go for a transformer stage, just like the way you do it in a server job.

RE: Reverse Pivoting

Posted: Fri Jun 08, 2007 3:01 pm
by UPS
DSguru2B,
Thanks - I will try this option and keep you appraised.

Thanks again.
John