Generating records based on column value

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bgs_vb
Premium Member
Premium Member
Posts: 79
Joined: Mon Jan 02, 2006 5:51 am

Generating records based on column value

Post by bgs_vb »

Hi ,

I have a equirement to generate number of duplicate records based on a column value .Example is shown below as :

Input :
id,name,address,col1
1,abc,xyz,5
2,cde,yz,2

Output:
As col1 is having value as 5 ,so in output we require 5 nstances of the same record .Similarly for second record ,we need to generate 2 records

id,name,address,col1
1,abc,xyz,10
1,abc,xyz,10
1,abc,xyz,10
1,abc,xyz,10
1,abc,xyz,10
2,cde,yz,2
2,cde,yz,2

Please advise how can i achieve this result .
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

This looks like the perfect application of transform stage loops available in Version 8.7 and above of DataStage. Do you have this version?
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Minor correction: Transformer-Loop is available since version 8.5.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
bgs_vb
Premium Member
Premium Member
Posts: 79
Joined: Mon Jan 02, 2006 5:51 am

Post by bgs_vb »

Hi

I am working on Datastage 8.1 .Also ,is there any other way of doing it rather than using transform lopp variable ? As we are expecting huge data from source and using transform loop variables will slow down the process .
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Generating records based on column value

Post by chulett »

bgs_vb wrote:id,name,address,col1
1,abc,xyz,10
1,abc,xyz,10
1,abc,xyz,10
1,abc,xyz,10
1,abc,xyz,10
2,cde,yz,2
2,cde,yz,2
So... you need to turn the '5' in the original record to a '10' for that first id? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

bgs_vb wrote: As we are expecting huge data from source and using transform loop variables will slow down the process .
Who told You that using loop variables was relevant in terms of performance? It is not. By the way, I thought using transform loop variables is not an option because You are on version 8.1?
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

sorry, as corrected above, the loops are available as of 8.5.

Since you have 8.1 and loops within a transform stage are not an option you have a simple, but limited option if the maximum number of iterations is known; i.e. if you know you won't have more than 25 in column1 then you could make a transform stage with 25 outputs that have conditions along the lines of "In.Column1 < 2", "In.Column3 < 1", etc. This is a simple and bulletproof approach but won't work if can have values of 100 or higher.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There's a new addition to the Sequential File stage (and to the Big Data stage) in version 9.1 that does exactly this task.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply