Page 1 of 1

How to dynamically generate rows

Posted: Tue Dec 25, 2012 1:08 am
by Brian Zhang
Dear All,

I have a requirement asking for dynamically generating rows in a PX job. The number of rows to be generated is recorded in a column (col2) in the input link as below. Please share with me any of your thought on this as I didn't find a built-in stage that can directly handle this logic. Thanks in advance.

Input:
key col1 col2
-----------------------
aa 100 3
bb 200 2

Output:
key col1 col3
-----------------------
aa 100 100
aa 100 101
aa 100 102
bb 200 200
bb 200 201

Re: How to dynamically generate rows

Posted: Tue Dec 25, 2012 2:25 am
by Brian Zhang
What I can think of is to write an Unix shell script w/ a loop segment and wrap it into the PX job. Is there any alternatives? Thanks.

Posted: Tue Dec 25, 2012 7:12 am
by Mike
Your best alternative is to upgrade to a version that supports transformer stage looping.

With version 8.1, you'll need to get creative. What is the maximum value of your col2?

If the maximum is unreasonably high, you're probably on the right track with a custom stage.

A server job would probably be easier than a parallel job at version 8.1.

Mike

Posted: Wed Dec 26, 2012 5:50 am
by ArndW
If your maximum value in Col2 is known and a small amount, then a simple Transform stage with multiple outputs for each iteration value followed by a funnel would be sufficient.

Posted: Wed Dec 26, 2012 11:27 am
by Brian Zhang
Thank you, Mike.

The value in col2 could be between 1 and 108 as I see so far, and it could be larger which is not known in advance. A custom stage should be better but that would be costly for us as we are lack of such c programming skills in DataStage.

If achieving this requirement in a server job, are you referring to the Command stage? in which we can invoke a shell script? Thanks.

Posted: Wed Dec 26, 2012 11:35 am
by Brian Zhang
Thanks, ArndW.

So far I can see the value in col2 can be up to 108, so drawing 108 output links from a Transformer stage to cover all cases might not be wise. As Mike mentioned, implementing it in a Transformer stage that supports looping features would be the best approach for me, which will require a upgrade to 8.5 release or later. Thanks.

Posted: Wed Dec 26, 2012 11:36 am
by Mike
For a server job, I was thinking along the lines of a custom routine to handle the looping requirement.

That would of course require DataStage BASIC programming skills.

DataStage BASIC might be easier for a novice to navigate using the BASIC reference and the large number of examples you can find on DSXchange.

Mike

Posted: Wed Dec 26, 2012 11:42 am
by Brian Zhang
I see. Thanks, Mike. If we developed such a BASIC routine, which would you recommend to implement it?

1) in a server job
2) in a BASIC Transformer stage in a px job

Do you think both are equivalent? Thanks.

Posted: Wed Dec 26, 2012 12:00 pm
by Mike
Definitely go with the server job.

My introduction to DataStage came via the COBOL code generation technology and mainframe jobs, so I'm still in the novice camp when it comes to DataStage BASIC.

Maybe one of the oldtimers that grew up with Universe and PICK can suggest some tricks to make your job easier.

Mike

Posted: Wed Dec 26, 2012 12:18 pm
by Brian Zhang
Thanks again for your prompt reply, Mike.

Have a nice day.

Hong

Posted: Wed Dec 26, 2012 2:07 pm
by rameshrr3
You can use a parallel job with transformer in v 8.1 and row gen and lookup for this as ArndW says - ONLY if the max possible value of input.col2 is known in advance.( Otherwise you need to determine it using an Unix Sort or andther datastage job - and set its max value as a parameter to the main processing job that i will describe)

To your input dataset add a dummy key row and fix a string value '1' to it . In your row generator , create a column that has predefined sequence (integer) as output cycling from 1 to 108 [ set cycle to generate 108 rows for each cycle] , and another column that has fixed string value 1.

In the parallel lookup stage , enable multi row result lookup ,
Lookup Input and RowGen.Output using a lookup stage with Key = <fixed value column> - So for every Input row you will get 108 output rows from the lookup . Add a downstream transformer or filter stage to evaluate if rowgen cycle number is less than or equal to the Input.Column2 ( counter value) - If constraint is satisfied , pass to output link , and create a new column derivation to add the current row gen cycle number to the Input.Column1 data value.

Posted: Wed Dec 26, 2012 2:34 pm
by Mike
Very good rameshrr3. This is one of those creative solutions I was anticipating. Just parameterize the number of records and cycle limit values as suggested in the row generator and enable multiple rows returned from the lookup.

Mike

Posted: Wed Dec 26, 2012 4:45 pm
by rameshrr3
I just found out that while the number of records from the rowgen stage can be parametrized , the cycle limit cannot be - so I guess you are better off knowing the cycle limit (max for now and for all of eternity ) in advance :) .

I had used this logic earlier to multiply num rows and get a counter value to do a variable horizontal pivot using the field() function .

Posted: Wed Dec 26, 2012 4:47 pm
by Brian Zhang
Elegant! I believe this is the solution I was looking for in v8.1. Thanks very much, rameshrr3! :D

Posted: Wed Dec 26, 2012 4:56 pm
by Brian Zhang
rameshrr3 wrote:I just found out that while the number of records from the rowgen stage can be parametrized , the cycle limit cannot be - so I guess you are better off knowing the cycle limit (max for now and for all of eternity ) in advance :) .

I had used this logic earlier to multiply num rows and get a counter value to do a variable horizontal pivot using the field() function .
Do we need to set a limit? If I want the Row Generator stage to generate 108 rows, and set the value starting from 1 and incremented by 1, I think I will get 108 rows with 1 to 108 populated in the colum, right?