How to dynamically generate rows
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 39
- Joined: Tue Jun 03, 2008 12:13 am
- Location: Melbourne
How to dynamically generate rows
Dear All,
I have a requirement asking for dynamically generating rows in a PX job. The number of rows to be generated is recorded in a column (col2) in the input link as below. Please share with me any of your thought on this as I didn't find a built-in stage that can directly handle this logic. Thanks in advance.
Input:
key col1 col2
-----------------------
aa 100 3
bb 200 2
Output:
key col1 col3
-----------------------
aa 100 100
aa 100 101
aa 100 102
bb 200 200
bb 200 201
I have a requirement asking for dynamically generating rows in a PX job. The number of rows to be generated is recorded in a column (col2) in the input link as below. Please share with me any of your thought on this as I didn't find a built-in stage that can directly handle this logic. Thanks in advance.
Input:
key col1 col2
-----------------------
aa 100 3
bb 200 2
Output:
key col1 col3
-----------------------
aa 100 100
aa 100 101
aa 100 102
bb 200 200
bb 200 201
-
- Participant
- Posts: 39
- Joined: Tue Jun 03, 2008 12:13 am
- Location: Melbourne
Re: How to dynamically generate rows
What I can think of is to write an Unix shell script w/ a loop segment and wrap it into the PX job. Is there any alternatives? Thanks.
Your best alternative is to upgrade to a version that supports transformer stage looping.
With version 8.1, you'll need to get creative. What is the maximum value of your col2?
If the maximum is unreasonably high, you're probably on the right track with a custom stage.
A server job would probably be easier than a parallel job at version 8.1.
Mike
With version 8.1, you'll need to get creative. What is the maximum value of your col2?
If the maximum is unreasonably high, you're probably on the right track with a custom stage.
A server job would probably be easier than a parallel job at version 8.1.
Mike
If your maximum value in Col2 is known and a small amount, then a simple Transform stage with multiple outputs for each iteration value followed by a funnel would be sufficient.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 39
- Joined: Tue Jun 03, 2008 12:13 am
- Location: Melbourne
Thank you, Mike.
The value in col2 could be between 1 and 108 as I see so far, and it could be larger which is not known in advance. A custom stage should be better but that would be costly for us as we are lack of such c programming skills in DataStage.
If achieving this requirement in a server job, are you referring to the Command stage? in which we can invoke a shell script? Thanks.
The value in col2 could be between 1 and 108 as I see so far, and it could be larger which is not known in advance. A custom stage should be better but that would be costly for us as we are lack of such c programming skills in DataStage.
If achieving this requirement in a server job, are you referring to the Command stage? in which we can invoke a shell script? Thanks.
-
- Participant
- Posts: 39
- Joined: Tue Jun 03, 2008 12:13 am
- Location: Melbourne
Thanks, ArndW.
So far I can see the value in col2 can be up to 108, so drawing 108 output links from a Transformer stage to cover all cases might not be wise. As Mike mentioned, implementing it in a Transformer stage that supports looping features would be the best approach for me, which will require a upgrade to 8.5 release or later. Thanks.
So far I can see the value in col2 can be up to 108, so drawing 108 output links from a Transformer stage to cover all cases might not be wise. As Mike mentioned, implementing it in a Transformer stage that supports looping features would be the best approach for me, which will require a upgrade to 8.5 release or later. Thanks.
For a server job, I was thinking along the lines of a custom routine to handle the looping requirement.
That would of course require DataStage BASIC programming skills.
DataStage BASIC might be easier for a novice to navigate using the BASIC reference and the large number of examples you can find on DSXchange.
Mike
That would of course require DataStage BASIC programming skills.
DataStage BASIC might be easier for a novice to navigate using the BASIC reference and the large number of examples you can find on DSXchange.
Mike
-
- Participant
- Posts: 39
- Joined: Tue Jun 03, 2008 12:13 am
- Location: Melbourne
Definitely go with the server job.
My introduction to DataStage came via the COBOL code generation technology and mainframe jobs, so I'm still in the novice camp when it comes to DataStage BASIC.
Maybe one of the oldtimers that grew up with Universe and PICK can suggest some tricks to make your job easier.
Mike
My introduction to DataStage came via the COBOL code generation technology and mainframe jobs, so I'm still in the novice camp when it comes to DataStage BASIC.
Maybe one of the oldtimers that grew up with Universe and PICK can suggest some tricks to make your job easier.
Mike
-
- Participant
- Posts: 39
- Joined: Tue Jun 03, 2008 12:13 am
- Location: Melbourne
You can use a parallel job with transformer in v 8.1 and row gen and lookup for this as ArndW says - ONLY if the max possible value of input.col2 is known in advance.( Otherwise you need to determine it using an Unix Sort or andther datastage job - and set its max value as a parameter to the main processing job that i will describe)
To your input dataset add a dummy key row and fix a string value '1' to it . In your row generator , create a column that has predefined sequence (integer) as output cycling from 1 to 108 [ set cycle to generate 108 rows for each cycle] , and another column that has fixed string value 1.
In the parallel lookup stage , enable multi row result lookup ,
Lookup Input and RowGen.Output using a lookup stage with Key = <fixed value column> - So for every Input row you will get 108 output rows from the lookup . Add a downstream transformer or filter stage to evaluate if rowgen cycle number is less than or equal to the Input.Column2 ( counter value) - If constraint is satisfied , pass to output link , and create a new column derivation to add the current row gen cycle number to the Input.Column1 data value.
To your input dataset add a dummy key row and fix a string value '1' to it . In your row generator , create a column that has predefined sequence (integer) as output cycling from 1 to 108 [ set cycle to generate 108 rows for each cycle] , and another column that has fixed string value 1.
In the parallel lookup stage , enable multi row result lookup ,
Lookup Input and RowGen.Output using a lookup stage with Key = <fixed value column> - So for every Input row you will get 108 output rows from the lookup . Add a downstream transformer or filter stage to evaluate if rowgen cycle number is less than or equal to the Input.Column2 ( counter value) - If constraint is satisfied , pass to output link , and create a new column derivation to add the current row gen cycle number to the Input.Column1 data value.
I just found out that while the number of records from the rowgen stage can be parametrized , the cycle limit cannot be - so I guess you are better off knowing the cycle limit (max for now and for all of eternity ) in advance .
I had used this logic earlier to multiply num rows and get a counter value to do a variable horizontal pivot using the field() function .
I had used this logic earlier to multiply num rows and get a counter value to do a variable horizontal pivot using the field() function .
-
- Participant
- Posts: 39
- Joined: Tue Jun 03, 2008 12:13 am
- Location: Melbourne
-
- Participant
- Posts: 39
- Joined: Tue Jun 03, 2008 12:13 am
- Location: Melbourne
Do we need to set a limit? If I want the Row Generator stage to generate 108 rows, and set the value starting from 1 and incremented by 1, I think I will get 108 rows with 1 to 108 populated in the colum, right?rameshrr3 wrote:I just found out that while the number of records from the rowgen stage can be parametrized , the cycle limit cannot be - so I guess you are better off knowing the cycle limit (max for now and for all of eternity ) in advance .
I had used this logic earlier to multiply num rows and get a counter value to do a variable horizontal pivot using the field() function .