Data Pivoting/Spliting

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vish08
Premium Member
Premium Member
Posts: 16
Joined: Mon May 12, 2008 8:45 am

Data Pivoting/Spliting

Post by vish08 »

Data is coming in the following format:

Column1 Column2
1 Bill,Brain,Greagory
2 Sam, Bopel
3
4 Toby

As you see each row has different number of values. I need to preserve Column1 and associate values.

Output should be as below:

1 Bill
1 Brain
1 Greagory
2 Sam
2 Bopel
3
4 Toby

Please give me some input how to implement in DataStage.
Thank you for valuable input and help.
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

Have you tried defining the Names column as type Vector,
and Number column as Key column, Then use Make Subrecord Stage? NOt too sure, but it may work for you.

Regards
Ramesh
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Another approach is to parse the values into separate columns and use a Pivot stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vish08
Premium Member
Premium Member
Posts: 16
Joined: Mon May 12, 2008 8:45 am

Post by vish08 »

Thank you for replying Ray. In this case, we are getting variable number of fields.

Would you please provide more information how can this can be accomplished with Pivot.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Create sufficient columns to handle the largest possible number of values.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Or join with a row generator and do a split the fields.
shankar_ramanath
Premium Member
Premium Member
Posts: 67
Joined: Thu Aug 09, 2007 7:51 pm

Post by shankar_ramanath »

Sainath.Srinivasan wrote:Or join with a row generator and do a split the fields.
Hi Sainath,

I have a similar requirement and I am trying to follow your note. How does a row generator help? What is the column to join? Since Row Generator generates arbitrary (mock) data how could we use it for the task at hand?

Thanks,
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Row generator will produce 2 columns which are
1.) a dummy column (say constant 1)
2.) sequence from 1 to n
(n = max possible list of values in the column you wish to split)

Add the same dummy column to your input row.

Join them, use the sequence column to field() and add necessary constraints.

You will have the required output.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Also if you want to a thorough solution, go for an unix command or vector or build op.
shankar_ramanath
Premium Member
Premium Member
Posts: 67
Joined: Thu Aug 09, 2007 7:51 pm

Post by shankar_ramanath »

Sainath.Srinivasan wrote:Row generator will produce 2 columns which are
1.) a dummy column (say constant 1)
2.) sequence from 1 to n
(n = max possible list of values in the column you wish to split)

Add the same dummy column to your input row.

Join them, use the sequence column to field() and add necessary constraints.

You will have the required output.
Hi Sainath,

Thanks! I tried to follow the suggestion with a paper and pencil. I am stuck.

For the example in question, here is what I have

Row Generator (for the second column, the limit is set to 4)

Col1 Col2
1 1
1 2
1 3
1 4
1 1
1 2
1 3
1 4

The input (as in the original question) is

colX colY
1 Bill,Brain,Greagory
2 Sam, Bopel
3
4 Toby

In the Join stage, I was wondering if I should be joining col2 with colX? Sorry I dont get it.

Thanks,
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

====
Input:
====
colX colY
1 Bill,Brain,Greagory
2 Sam, Bopel
3
4 Toby
Here the max occurance = 3 (for row 1).

Step1:
So generate 3 rows from rowgen.
=====
rowgen:
=====
colDummy colIndex
1 1
1 2
1 3

Step2:
Add dummy column to input
colDummy colX colY
1 1 Bill,Brain,Greagory
1 2 Sam, Bopel
1 3
1 4 Toby
Step3:
Join by colDummy
This will give you (4 x 3 = ) 12 rows.
Step 4:
derive columns

Code: Select all

out.colX  = inp.colX  
newColY = field(inp.colY, ',', colIndex)
Note - here you will have to control for the number of entries in the row.
Step 5:
You have your output.
Step 6:

Code: Select all

Leave the money in usual place.
shankar_ramanath
Premium Member
Premium Member
Posts: 67
Joined: Thu Aug 09, 2007 7:51 pm

Post by shankar_ramanath »

I realize that the reply is late.

But the solution worked great.

Since I am not the OP, I am not sure if I can mark it Resolved.

Many thanks to you Sainath, I appreciate it very much.
Post Reply