I have used a Pivot Stage in DS8 Parallel job. After performing some calculations I need to reverse pivot the data back into one row. Is there an easy way to do it?
Example:
Pivot output:
ID1 Val1
ID1 Val2
ID1 Val3
ID2 Val1
ID2 Val2
ID3 Val3
Expected Reverse Pivot output:
ID1 Val1 Val2 Val3
ID2 Val1 Val2 Null
ID3 Null Null Val3
Reverse Pivot in Parallel Jobs (DS8) ?
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 15
- Joined: Tue Sep 09, 2003 10:41 pm
Check out the below link
viewtopic.php?t=99149&postdays=0&postor ... ot&start=0
Hoping DS 8 works similar to DS 7.x
viewtopic.php?t=99149&postdays=0&postor ... ot&start=0
Hoping DS 8 works similar to DS 7.x
Narasimha Kade
Finding answers is simple, all you need to do is come up with the correct questions.
Finding answers is simple, all you need to do is come up with the correct questions.
-
- Premium Member
- Posts: 15
- Joined: Tue Sep 09, 2003 10:41 pm
narasimha wrote:Check out the below link
viewtopic.php?t=99149&postdays=0&postor ... ot&start=0
Hoping DS 8 works similar to DS 7.x
I looked thru the link you gave... nothing in there really explains how to do it in parallel jobs. I have done it in Server Jobs using Hash files or Stage Variables.
I tried to sort the rows by the key column and then read the rows and started putting the values in Stage variables. Every row I moved the current value into a previous value stage variable and checked if the current key col is equal to the previous key col. If they are diff then I write all the values in the stage variables. This does not really work in the parallel jobs because I lose the last line in the input. Hope this makes sense.
But I am looking for some ideas to do this in DS8. I cannot use the Datasets as Hash files where I can read and write into it at the same time.
Pivot output:
ID1 Val1
ID1 Val2
ID1 Val3
ID2 Val1
ID2 Val2
ID3 Val3
Use Transformer Stage and Remove Duplicate Stage.
In Transformer Stage, Use Stage Variable for Val1, Val2 & Val3 and Also ID(the key).
When you send the first row to transformer, you will have stage variables having the corresponding values like below
stg_Val1 = Val1, stg_Val2 = null, stg_Val3 = null and stg_ID = ID1
For 2nd Row
stg_Val1 = Val1, stg_Val2 = Val2, stg_Val3 = null and stg_ID = ID1
For 3rd Row
stg_Val1 = Val1, stg_Val2 = Val2, stg_Val3 = Val3 and stg_ID = ID1
For each row, you assign the Stage Variables to the output, so 3 rows as output would have been sent, and the next stage will hold the data like below.
col1 Col2 Col3 Col4
ID1 Val1 Null Null
ID1 Val1 Val2 Null
ID1 Val1 Val2 Val3
When you get the next key(ID), reset the Stage Variables. So for ID2 you would have output like below.
col1 Col2 Col3 Col4
ID2 Val1 Null Null
ID2 Val1 Val2 Null
so for the next key(ID), reset the Stage Variables again. So for ID3 you would have output like below.
col1 Col2 Col3 Col4
ID3 Val1 Null Null
Send all of these output to the Remove Duplicate stage, and you the same key as above and retain the last row for each keys.
So you would get output like below.
ID1 Val1 Val2 Val3
ID2 Val1 Val2 Null
ID3 Val1 Null Null
Hope this helps. This is the way I do, there should be another way of it. Please share if someone finds it.
ID1 Val1
ID1 Val2
ID1 Val3
ID2 Val1
ID2 Val2
ID3 Val3
Use Transformer Stage and Remove Duplicate Stage.
In Transformer Stage, Use Stage Variable for Val1, Val2 & Val3 and Also ID(the key).
When you send the first row to transformer, you will have stage variables having the corresponding values like below
stg_Val1 = Val1, stg_Val2 = null, stg_Val3 = null and stg_ID = ID1
For 2nd Row
stg_Val1 = Val1, stg_Val2 = Val2, stg_Val3 = null and stg_ID = ID1
For 3rd Row
stg_Val1 = Val1, stg_Val2 = Val2, stg_Val3 = Val3 and stg_ID = ID1
For each row, you assign the Stage Variables to the output, so 3 rows as output would have been sent, and the next stage will hold the data like below.
col1 Col2 Col3 Col4
ID1 Val1 Null Null
ID1 Val1 Val2 Null
ID1 Val1 Val2 Val3
When you get the next key(ID), reset the Stage Variables. So for ID2 you would have output like below.
col1 Col2 Col3 Col4
ID2 Val1 Null Null
ID2 Val1 Val2 Null
so for the next key(ID), reset the Stage Variables again. So for ID3 you would have output like below.
col1 Col2 Col3 Col4
ID3 Val1 Null Null
Send all of these output to the Remove Duplicate stage, and you the same key as above and retain the last row for each keys.
So you would get output like below.
ID1 Val1 Val2 Val3
ID2 Val1 Val2 Null
ID3 Val1 Null Null
Hope this helps. This is the way I do, there should be another way of it. Please share if someone finds it.
-
- Participant
- Posts: 24
- Joined: Mon Jul 12, 2004 10:16 am
Re: Reverse Pivot in Parallel Jobs (DS8) ?
Reverse PIVOT in datastage is too slow and best way to handle is
a) Push the dataset or data into a physical table
b) write select statement from the table using CASE statement
c) push the result into target
a) Push the dataset or data into a physical table
b) write select statement from the table using CASE statement
c) push the result into target
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 15
- Joined: Tue Sep 09, 2003 10:41 pm
ray.wurlod wrote:You remember correctly, but there is a caveat that none of the "go forward" plans is guaranteed to happen. Yes, it's on plan for version 8.1.
Thanks all for your suggestions.
I did not have the liberty to use a temporary table, but was able to use the Stage Variables to control the data using a PrevVal and CurrVal stage variable for all the Columns. The logic is unnecessarily complicated, but atleast performance wise it has not been too bad.
I am trying to use the Remove Duplicate method suggested by Ramani, it sounds good.
Hope DS comes up with a stage specifically for this... would definitely kill al lot of unnecessary posts