Page 1 of 1

Generating records based on column value

Posted: Tue Oct 30, 2012 7:17 am
by bgs_vb
Hi ,

I have a equirement to generate number of duplicate records based on a column value .Example is shown below as :

Input :
id,name,address,col1
1,abc,xyz,5
2,cde,yz,2

Output:
As col1 is having value as 5 ,so in output we require 5 nstances of the same record .Similarly for second record ,we need to generate 2 records

id,name,address,col1
1,abc,xyz,10
1,abc,xyz,10
1,abc,xyz,10
1,abc,xyz,10
1,abc,xyz,10
2,cde,yz,2
2,cde,yz,2

Please advise how can i achieve this result .

Posted: Tue Oct 30, 2012 7:19 am
by ArndW
This looks like the perfect application of transform stage loops available in Version 8.7 and above of DataStage. Do you have this version?

Posted: Tue Oct 30, 2012 7:33 am
by BI-RMA
Minor correction: Transformer-Loop is available since version 8.5.

Posted: Tue Oct 30, 2012 7:34 am
by bgs_vb
Hi

I am working on Datastage 8.1 .Also ,is there any other way of doing it rather than using transform lopp variable ? As we are expecting huge data from source and using transform loop variables will slow down the process .

Re: Generating records based on column value

Posted: Tue Oct 30, 2012 7:50 am
by chulett
bgs_vb wrote:id,name,address,col1
1,abc,xyz,10
1,abc,xyz,10
1,abc,xyz,10
1,abc,xyz,10
1,abc,xyz,10
2,cde,yz,2
2,cde,yz,2
So... you need to turn the '5' in the original record to a '10' for that first id? :?

Posted: Tue Oct 30, 2012 8:15 am
by BI-RMA
bgs_vb wrote: As we are expecting huge data from source and using transform loop variables will slow down the process .
Who told You that using loop variables was relevant in terms of performance? It is not. By the way, I thought using transform loop variables is not an option because You are on version 8.1?

Posted: Tue Oct 30, 2012 11:44 am
by ArndW
sorry, as corrected above, the loops are available as of 8.5.

Since you have 8.1 and loops within a transform stage are not an option you have a simple, but limited option if the maximum number of iterations is known; i.e. if you know you won't have more than 25 in column1 then you could make a transform stage with 25 outputs that have conditions along the lines of "In.Column1 < 2", "In.Column3 < 1", etc. This is a simple and bulletproof approach but won't work if can have values of 100 or higher.

Posted: Tue Oct 30, 2012 7:02 pm
by ray.wurlod
There's a new addition to the Sequential File stage (and to the Big Data stage) in version 9.1 that does exactly this task.