Generating multiple rows based on the data of 2 input column

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Generating multiple rows based on the data of 2 input column

Post by I_Server_Whale »

Hi all,

I have a input file with two columns (MIN_ZIP_CODE and MAX_ZIP_CODE).

For example:

Code: Select all


MIN_ZIP_CODE                                      MAX_ZIP_CODE

75101                                                 75104
89430                                                 89435
00010                                                 00015
I need to generate the output as follows:

Code: Select all


MIN_ZIP_CODE                     MAX_ZIP_CODE                     ZIP_CODE

75101                                 75104                       75101
75101                                 75104                       75102
75101                                 75104                       75103
75101                                 75104                       75104
89430                                 89435                       89430
89430                                 89435                       89431
89430                                 89435                       89432
89430                                 89435                       89433
89430                                 89435                       89434
89430                                 89435                       89435
00010                                 00015                       00010
00010                                 00015                       00011
00010                                 00015                       00012
00010                                 00015                       00013
00010                                 00015                       00014
00010                                 00015                       00015
In other words,

Code: Select all

Zip_Code, is obtained by expanding the zip-code range starting from the MIN_ZIP_CODE, incrementing it by 1 until the MAX_ZIP_CODE value is reached.
Any idea of how this can be achieved? My brain's half dead. Any ideas/suggestions/pointers are greatly appreciated.

Thanks,
Whale.
Last edited by I_Server_Whale on Mon Jan 14, 2008 9:37 pm, edited 2 times in total.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Dont you have RangeLookup in Version 8??
And more over, how would following record has only 4 entry?

Code: Select all

MIN_ZIP_CODE                                      MAX_ZIP_CODE 

75101                                                 75114 
Just an typo??
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

4 not 14? The reason is already in original post: "brain's half dead".

External filter stage that runs a shell script to iterate through the values is one way, if you're comfortable with scripting.

Build stage to iterate through the values is another way, if you're comfortable with C++ coding.

There's nothing out of the box for generating multiple ranges of rows.

With only one range you could, of course, use a Row Generator stage.

There is an easy solution using a pair of jobs, a server job calling UtilityRunJob and passing the limits and a parallel job invoked from it that uses a Row Generator stage.

Just some ideas.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

kumar_s wrote:Dont you have RangeLookup in Version 8??
And more over, how would following record has only 4 entry?

Code: Select all

MIN_ZIP_CODE                                      MAX_ZIP_CODE 

75101                                                 75114 
Just an typo??
Oops!! Sorry, Folks. Had a long day. Yes. It should have been 04 and not 14. I don't think range lookup will be of help.

I have corrected the previous post.

Thank you, Ray!. I will try your suggestions. What about using stage variables to that effect? Read in the two columns into two stage variables and generate records based on the logic and send them to the output link.

Just wondering. But thanks again for your valuable ideas. Will try them and get back to you.

Best Regards,
Whale.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you can tell me how to generate an arbitrary number of rows out of a Transformer stage for each row in, I'd be happy to know it.

There is a kluge where you use - typically - a routine to generate a long string with a line terminator between each generated "output row", and output that to a sequential file so that the line terminators turn into, umm..., line terminators, but that's just a kluge and you would still need a separate job to read the sequential file (since "blocking" - passive stage with input and output links - is expressly forbidden in parallel jobs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

This is what been refered.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ATBolton
Participant
Posts: 5
Joined: Mon Oct 01, 2007 10:25 am
Location: Houston, TX

Post by ATBolton »

I had a similar issue few weeks ago. Here is how I got around the problem.

1. Take the source file in to a transform. In the transform add a field call dummy key, place a constant value in it.

2. Use the row generator with a counter say 1 to 100. The output of the stage will have the counter plus a dummy key field with the same value as in step one.

3. Take the output of both 1 and 2 and join them together in a join stage. Each source file record will now have 100 records.

4. Send the output of the join stage into another transform to removed all the extra fields (dummy key, counter) and possible remove any records you don't want in the final output.
ATBolton
Participant
Posts: 5
Joined: Mon Oct 01, 2007 10:25 am
Location: Houston, TX

Post by ATBolton »

I had a similar issue few weeks ago. Here is how I got around the problem.

1. Take the source file in to a transform. In the transform add a field call dummy key, place a constant value in it.

2. Use the row generator with a counter say 1 to 100. The output of the stage will have the counter plus a dummy key field with the same value as in step one.

3. Take the output of both 1 and 2 and join them together in a join stage. Each source file record will now have 100 records.

4. Send the output of the join stage into another transform to removed all the extra fields (dummy key, counter) and possible remove any records you don't want in the final output.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That technique would work, but it's wasteful, and would require a priori knowledge of the maximum number of rows that might need to be generated from any one range (in your case 100 was sufficient - or was it ?!! How would you know (in all cases)?).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

ray.wurlod wrote:That technique would work, but it's wasteful, and would require a priori knowledge of the maximum number of rows that might need to be generated from any one range (in your case 100 was sufficient - or was it ?!! How would you know (in all cases)?).
Exactly! ATBolton's solution wouldn't work as Ray pointed out. Kumar's solution points to Server solution which I'm trying to avoid. I'm on my way to building a build-op (one of Ray's suggestion).

Will let you know how it goes!

Thanks for all the help rendered! Appreciate it!
Whale.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
santosh
Premium Member
Premium Member
Posts: 28
Joined: Tue Jan 11, 2005 1:09 am
Location: London

Post by santosh »

Not an excellent way of doing it but would like to suggest an option below. else i thibnk you have to go for custom stage.

1> Create a file in O/S (UNIX) to be used as lookup with layout as below.

count, records
1,1
2,1
2,2
3,1
3,2
3,3
and so on

2> Use this file as multiple lookup to generate your output. i.e.
a> In your original file , generate a new column call record_count = max_zip_code - min_zip_code
b> Do multiple lookup of record_count(source) with count(step 1 lookupfile)
regards,
Santosh
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

santosh wrote:Not an excellent way of doing it but would like to suggest an option below.
Santosh,

Can you explain why a "Build-Op" is not an excellent way of doing it compared to your round about method?

Thanks,
Whale.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I read that as Santosh admitting that Santosh's way is not an excellent way, merely offered as an alternative.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply