Data Pivoting/Spliting

vish08 · Post by **vish08** » Thu May 07, 2009 3:20 pm

Data is coming in the following format:

Column1 Column2
1 Bill,Brain,Greagory
2 Sam, Bopel
3
4 Toby

As you see each row has different number of values. I need to preserve Column1 and associate values.

Output should be as below:

1 Bill
1 Brain
1 Greagory
2 Sam
2 Bopel
3
4 Toby

Please give me some input how to implement in DataStage.
Thank you for valuable input and help.

rameshrr3 · Post by **rameshrr3** » Thu May 07, 2009 4:22 pm

Have you tried defining the Names column as type Vector,
and Number column as Key column, Then use Make Subrecord Stage? NOt too sure, but it may work for you.

Regards
Ramesh

ray.wurlod · Post by **ray.wurlod** » Thu May 07, 2009 4:59 pm

Another approach is to parse the values into separate columns and use a Pivot stage.

vish08 · Post by **vish08** » Fri May 08, 2009 10:27 am

Thank you for replying Ray. In this case, we are getting variable number of fields.

Would you please provide more information how can this can be accomplished with Pivot.

ray.wurlod · Post by **ray.wurlod** » Sun May 10, 2009 4:46 pm

Create sufficient columns to handle the largest possible number of values.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Mon May 11, 2009 5:31 am

Or join with a row generator and do a split the fields.

shankar_ramanath · Post by **shankar_ramanath** » Wed May 13, 2009 3:14 am

Sainath.Srinivasan wrote:Or join with a row generator and do a split the fields.

Hi Sainath,

I have a similar requirement and I am trying to follow your note. How does a row generator help? What is the column to join? Since Row Generator generates arbitrary (mock) data how could we use it for the task at hand?

Thanks,

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Wed May 13, 2009 3:57 am

Row generator will produce 2 columns which are
1.) a dummy column (say constant 1)
2.) sequence from 1 to n
(n = max possible list of values in the column you wish to split)

Add the same dummy column to your input row.

Join them, use the sequence column to field() and add necessary constraints.

You will have the required output.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Wed May 13, 2009 4:00 am

Also if you want to a thorough solution, go for an unix command or vector or build op.

shankar_ramanath · Post by **shankar_ramanath** » Wed May 13, 2009 2:52 pm

Sainath.Srinivasan wrote:Row generator will produce 2 columns which are
1.) a dummy column (say constant 1)
2.) sequence from 1 to n
(n = max possible list of values in the column you wish to split)

Add the same dummy column to your input row.

Join them, use the sequence column to field() and add necessary constraints.

You will have the required output.

Hi Sainath,

Thanks! I tried to follow the suggestion with a paper and pencil. I am stuck.

For the example in question, here is what I have

Row Generator (for the second column, the limit is set to 4)

Col1 Col2
1 1
1 2
1 3
1 4
1 1
1 2
1 3
1 4

The input (as in the original question) is

colX colY
1 Bill,Brain,Greagory
2 Sam, Bopel
3
4 Toby

In the Join stage, I was wondering if I should be joining col2 with colX? Sorry I dont get it.

Thanks,

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu May 14, 2009 2:40 am

====
Input:
====

colX colY
1 Bill,Brain,Greagory
2 Sam, Bopel
3
4 Toby

Here the max occurance = 3 (for row 1).

Step1:
So generate 3 rows from rowgen.
=====
rowgen:
=====

colDummy colIndex
1 1
1 2
1 3

Step2:
Add dummy column to input

colDummy colX colY
1 1 Bill,Brain,Greagory
1 2 Sam, Bopel
1 3
1 4 Toby

Step3:
Join by colDummy

This will give you (4 x 3 = ) 12 rows.

Step 4:
derive columns

Code: Select all

out.colX  = inp.colX  
newColY = field(inp.colY, ',', colIndex)
Note - here you will have to control for the number of entries in the row.

Step 5:

You have your output.

Step 6:

Code: Select all

Leave the money in usual place.

shankar_ramanath · Post by **shankar_ramanath** » Tue Aug 11, 2009 2:09 pm

I realize that the reply is late.

But the solution worked great.

Since I am not the OP, I am not sure if I can mark it Resolved.

Many thanks to you Sainath, I appreciate it very much.