How to dynamically generate rows

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Brian Zhang
Participant
Posts: 39
Joined: Tue Jun 03, 2008 12:13 am
Location: Melbourne

How to dynamically generate rows

Post by Brian Zhang »

Dear All,

I have a requirement asking for dynamically generating rows in a PX job. The number of rows to be generated is recorded in a column (col2) in the input link as below. Please share with me any of your thought on this as I didn't find a built-in stage that can directly handle this logic. Thanks in advance.

Input:
key col1 col2
-----------------------
aa 100 3
bb 200 2

Output:
key col1 col3
-----------------------
aa 100 100
aa 100 101
aa 100 102
bb 200 200
bb 200 201
Brian Zhang
Participant
Posts: 39
Joined: Tue Jun 03, 2008 12:13 am
Location: Melbourne

Re: How to dynamically generate rows

Post by Brian Zhang »

What I can think of is to write an Unix shell script w/ a loop segment and wrap it into the PX job. Is there any alternatives? Thanks.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Your best alternative is to upgrade to a version that supports transformer stage looping.

With version 8.1, you'll need to get creative. What is the maximum value of your col2?

If the maximum is unreasonably high, you're probably on the right track with a custom stage.

A server job would probably be easier than a parallel job at version 8.1.

Mike
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If your maximum value in Col2 is known and a small amount, then a simple Transform stage with multiple outputs for each iteration value followed by a funnel would be sufficient.
Brian Zhang
Participant
Posts: 39
Joined: Tue Jun 03, 2008 12:13 am
Location: Melbourne

Post by Brian Zhang »

Thank you, Mike.

The value in col2 could be between 1 and 108 as I see so far, and it could be larger which is not known in advance. A custom stage should be better but that would be costly for us as we are lack of such c programming skills in DataStage.

If achieving this requirement in a server job, are you referring to the Command stage? in which we can invoke a shell script? Thanks.
Brian Zhang
Participant
Posts: 39
Joined: Tue Jun 03, 2008 12:13 am
Location: Melbourne

Post by Brian Zhang »

Thanks, ArndW.

So far I can see the value in col2 can be up to 108, so drawing 108 output links from a Transformer stage to cover all cases might not be wise. As Mike mentioned, implementing it in a Transformer stage that supports looping features would be the best approach for me, which will require a upgrade to 8.5 release or later. Thanks.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

For a server job, I was thinking along the lines of a custom routine to handle the looping requirement.

That would of course require DataStage BASIC programming skills.

DataStage BASIC might be easier for a novice to navigate using the BASIC reference and the large number of examples you can find on DSXchange.

Mike
Brian Zhang
Participant
Posts: 39
Joined: Tue Jun 03, 2008 12:13 am
Location: Melbourne

Post by Brian Zhang »

I see. Thanks, Mike. If we developed such a BASIC routine, which would you recommend to implement it?

1) in a server job
2) in a BASIC Transformer stage in a px job

Do you think both are equivalent? Thanks.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Definitely go with the server job.

My introduction to DataStage came via the COBOL code generation technology and mainframe jobs, so I'm still in the novice camp when it comes to DataStage BASIC.

Maybe one of the oldtimers that grew up with Universe and PICK can suggest some tricks to make your job easier.

Mike
Brian Zhang
Participant
Posts: 39
Joined: Tue Jun 03, 2008 12:13 am
Location: Melbourne

Post by Brian Zhang »

Thanks again for your prompt reply, Mike.

Have a nice day.

Hong
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

You can use a parallel job with transformer in v 8.1 and row gen and lookup for this as ArndW says - ONLY if the max possible value of input.col2 is known in advance.( Otherwise you need to determine it using an Unix Sort or andther datastage job - and set its max value as a parameter to the main processing job that i will describe)

To your input dataset add a dummy key row and fix a string value '1' to it . In your row generator , create a column that has predefined sequence (integer) as output cycling from 1 to 108 [ set cycle to generate 108 rows for each cycle] , and another column that has fixed string value 1.

In the parallel lookup stage , enable multi row result lookup ,
Lookup Input and RowGen.Output using a lookup stage with Key = <fixed value column> - So for every Input row you will get 108 output rows from the lookup . Add a downstream transformer or filter stage to evaluate if rowgen cycle number is less than or equal to the Input.Column2 ( counter value) - If constraint is satisfied , pass to output link , and create a new column derivation to add the current row gen cycle number to the Input.Column1 data value.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Very good rameshrr3. This is one of those creative solutions I was anticipating. Just parameterize the number of records and cycle limit values as suggested in the row generator and enable multiple rows returned from the lookup.

Mike
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

I just found out that while the number of records from the rowgen stage can be parametrized , the cycle limit cannot be - so I guess you are better off knowing the cycle limit (max for now and for all of eternity ) in advance :) .

I had used this logic earlier to multiply num rows and get a counter value to do a variable horizontal pivot using the field() function .
Brian Zhang
Participant
Posts: 39
Joined: Tue Jun 03, 2008 12:13 am
Location: Melbourne

Post by Brian Zhang »

Elegant! I believe this is the solution I was looking for in v8.1. Thanks very much, rameshrr3! :D
Brian Zhang
Participant
Posts: 39
Joined: Tue Jun 03, 2008 12:13 am
Location: Melbourne

Post by Brian Zhang »

rameshrr3 wrote:I just found out that while the number of records from the rowgen stage can be parametrized , the cycle limit cannot be - so I guess you are better off knowing the cycle limit (max for now and for all of eternity ) in advance :) .

I had used this logic earlier to multiply num rows and get a counter value to do a variable horizontal pivot using the field() function .
Do we need to set a limit? If I want the Row Generator stage to generate 108 rows, and set the value starting from 1 and incremented by 1, I think I will get 108 rows with 1 to 108 populated in the colum, right?
Post Reply