Reverse Pivot

UPS · Post by **UPS** » Thu Jun 07, 2007 9:15 am

Newbie question

- Can anybody tell me how to accomplish what the PIVOT stage does but in reverse.
Input:
rec1: key1, fld1
rec2: key1, fld1
rec3: key1, fld1
rec4: key2, fld1
rec5: key2, fld1
...
OutPut:
rec1: key1,rec1.fld1,rec2.fld1,rec3.fld1
rec2: key2,rec4.fld1,rec5.fld1,,
...

The process should be able to handle some decision making like what output bucket to populate on certain criteria.

Any help, in detail, would be VERY appreciated

Larry.Griffith · Post by **Larry.Griffith** » Thu Jun 07, 2007 12:50 pm

The databases I work with are deep not wide, so I do this a lot.

Doing this in a parallel manner with a generic routine is difficult. I have done this with a C routine plug in. The trick is to get all the columns for the same record into the same partition. I do this by hash/sort partitioning on the key field (and only the key field) into the plugin.

Some of the things you need to know when doing this are

- keyid
- output order (unless you have a row for every column and you can garantee they are in the correct order)
- format (TEXT, VALUE, NUMBER, DATE) (If you need to use quotes, or need to format the data )
- column value

Good luck[/list]

ArndW · Post by **ArndW** » Thu Jun 07, 2007 3:22 pm

If your input data is sorted by key and fld1 then you can use the "classical" approach as done in Server jobs. Put in a transform stage and use stage variables to store your interim output columns and last column data data, set the constraint to only output rows when the key-fld1 changes and output the collected list of key and fld1 combinations.

UPS · Post by **UPS** » Fri Jun 08, 2007 10:06 am

Hi Arndw
You will know the key changed only when you read the row where the key changed so ideally
rec1: key1, fld1
rec2: key1, fld1
rec3: key1, fld1
rec4: key2, fld1
rec5: key2, fld1

Only when i read in rec4 Key2,fld1 i will know the key has changed so if you want me to write out what i have been holding till then in the stage variables in to the output link it will not work for the last record because you will never know when to write it.

i am trying with a couple of sort stages logic to find the key change the claasical approach. I have an other idea of writing everything out and then using the remove duplicates to retain the last row which will have the most consolidated information.

Thanks
Pavan

DSguru2B · Post by **DSguru2B** » Fri Jun 08, 2007 10:35 am

Why dont you think it will work again? Keep apending the values to the stage variable untill a key change is encountered. In that case reset the stage variable and start appending untill the next key change is encountered. Do this till the last record. Then pass the output to remove duplicate stage and retain the last record for each id.

UPS · Post by **UPS** » Fri Jun 08, 2007 1:22 pm

Hi Arndw and DSguru2B,
I understand your concept and I have done something very similar. I am looking for a Parallel solution but I will definitely keep this in mind.
I do not know the other person who posted under UPS. I am the original one who posted as UPS.

Thanks for your help.
John - jalden@ups.com

DSguru2B · Post by **DSguru2B** » Fri Jun 08, 2007 1:33 pm

This will work in a parallel job as well. Just sort on the key and hash partition so that all identical keys end up in one partition.

UPS · Post by **UPS** » Fri Jun 08, 2007 1:59 pm

DSguru2B,
What stage would I use then? Would I use a C routine as someone previously suggested?

Regards -
John

DSguru2B · Post by **DSguru2B** » Fri Jun 08, 2007 2:22 pm

Naa, I would go for a transformer stage, just like the way you do it in a server job.

UPS · Post by **UPS** » Fri Jun 08, 2007 3:01 pm

DSguru2B,
Thanks - I will try this option and keep you appraised.

Thanks again.
John

DSXchange

Reverse Pivot

Reverse Pivot

RE: Reverse Pivoting

RE: Reverse Pivoting

RE: Reverse Pivoting