Page 1 of 1

Data Pivoting/Spliting

Posted: Thu May 07, 2009 3:20 pm
by vish08
Data is coming in the following format:

Column1 Column2
1 Bill,Brain,Greagory
2 Sam, Bopel
3
4 Toby

As you see each row has different number of values. I need to preserve Column1 and associate values.

Output should be as below:

1 Bill
1 Brain
1 Greagory
2 Sam
2 Bopel
3
4 Toby

Please give me some input how to implement in DataStage.
Thank you for valuable input and help.

Posted: Thu May 07, 2009 4:22 pm
by rameshrr3
Have you tried defining the Names column as type Vector,
and Number column as Key column, Then use Make Subrecord Stage? NOt too sure, but it may work for you.

Regards
Ramesh

Posted: Thu May 07, 2009 4:59 pm
by ray.wurlod
Another approach is to parse the values into separate columns and use a Pivot stage.

Posted: Fri May 08, 2009 10:27 am
by vish08
Thank you for replying Ray. In this case, we are getting variable number of fields.

Would you please provide more information how can this can be accomplished with Pivot.

Posted: Sun May 10, 2009 4:46 pm
by ray.wurlod
Create sufficient columns to handle the largest possible number of values.

Posted: Mon May 11, 2009 5:31 am
by Sainath.Srinivasan
Or join with a row generator and do a split the fields.

Posted: Wed May 13, 2009 3:14 am
by shankar_ramanath
Sainath.Srinivasan wrote:Or join with a row generator and do a split the fields.
Hi Sainath,

I have a similar requirement and I am trying to follow your note. How does a row generator help? What is the column to join? Since Row Generator generates arbitrary (mock) data how could we use it for the task at hand?

Thanks,

Posted: Wed May 13, 2009 3:57 am
by Sainath.Srinivasan
Row generator will produce 2 columns which are
1.) a dummy column (say constant 1)
2.) sequence from 1 to n
(n = max possible list of values in the column you wish to split)

Add the same dummy column to your input row.

Join them, use the sequence column to field() and add necessary constraints.

You will have the required output.

Posted: Wed May 13, 2009 4:00 am
by Sainath.Srinivasan
Also if you want to a thorough solution, go for an unix command or vector or build op.

Posted: Wed May 13, 2009 2:52 pm
by shankar_ramanath
Sainath.Srinivasan wrote:Row generator will produce 2 columns which are
1.) a dummy column (say constant 1)
2.) sequence from 1 to n
(n = max possible list of values in the column you wish to split)

Add the same dummy column to your input row.

Join them, use the sequence column to field() and add necessary constraints.

You will have the required output.
Hi Sainath,

Thanks! I tried to follow the suggestion with a paper and pencil. I am stuck.

For the example in question, here is what I have

Row Generator (for the second column, the limit is set to 4)

Col1 Col2
1 1
1 2
1 3
1 4
1 1
1 2
1 3
1 4

The input (as in the original question) is

colX colY
1 Bill,Brain,Greagory
2 Sam, Bopel
3
4 Toby

In the Join stage, I was wondering if I should be joining col2 with colX? Sorry I dont get it.

Thanks,

Posted: Thu May 14, 2009 2:40 am
by Sainath.Srinivasan
====
Input:
====
colX colY
1 Bill,Brain,Greagory
2 Sam, Bopel
3
4 Toby
Here the max occurance = 3 (for row 1).

Step1:
So generate 3 rows from rowgen.
=====
rowgen:
=====
colDummy colIndex
1 1
1 2
1 3

Step2:
Add dummy column to input
colDummy colX colY
1 1 Bill,Brain,Greagory
1 2 Sam, Bopel
1 3
1 4 Toby
Step3:
Join by colDummy
This will give you (4 x 3 = ) 12 rows.
Step 4:
derive columns

Code: Select all

out.colX  = inp.colX  
newColY = field(inp.colY, ',', colIndex)
Note - here you will have to control for the number of entries in the row.
Step 5:
You have your output.
Step 6:

Code: Select all

Leave the money in usual place.

Posted: Tue Aug 11, 2009 2:09 pm
by shankar_ramanath
I realize that the reply is late.

But the solution worked great.

Since I am not the OP, I am not sure if I can mark it Resolved.

Many thanks to you Sainath, I appreciate it very much.