Generating multiple rows based on the data of 2 input column

I_Server_Whale · Post by **I_Server_Whale** » Mon Jan 14, 2008 7:04 pm

Hi all,

I have a input file with two columns (MIN_ZIP_CODE and MAX_ZIP_CODE).

For example:


MIN_ZIP_CODE                                      MAX_ZIP_CODE

75101                                                 75104
89430                                                 89435
00010                                                 00015

I need to generate the output as follows:

Code: Select all


MIN_ZIP_CODE                     MAX_ZIP_CODE                     ZIP_CODE

75101                                 75104                       75101
75101                                 75104                       75102
75101                                 75104                       75103
75101                                 75104                       75104
89430                                 89435                       89430
89430                                 89435                       89431
89430                                 89435                       89432
89430                                 89435                       89433
89430                                 89435                       89434
89430                                 89435                       89435
00010                                 00015                       00010
00010                                 00015                       00011
00010                                 00015                       00012
00010                                 00015                       00013
00010                                 00015                       00014
00010                                 00015                       00015

In other words,

Code: Select all

Zip_Code, is obtained by expanding the zip-code range starting from the MIN_ZIP_CODE, incrementing it by 1 until the MAX_ZIP_CODE value is reached.

Any idea of how this can be achieved? My brain's half dead. Any ideas/suggestions/pointers are greatly appreciated.

Thanks,
Whale.

kumar_s · Post by **kumar_s** » Mon Jan 14, 2008 7:12 pm

Dont you have RangeLookup in Version 8??
And more over, how would following record has only 4 entry?

Code: Select all

MIN_ZIP_CODE                                      MAX_ZIP_CODE 

75101                                                 75114

Just an typo??

ray.wurlod · Post by **ray.wurlod** » Mon Jan 14, 2008 8:54 pm

4 not 14? The reason is already in original post: "brain's half dead".

External filter stage that runs a shell script to iterate through the values is one way, if you're comfortable with scripting.

Build stage to iterate through the values is another way, if you're comfortable with C++ coding.

There's nothing out of the box for generating multiple ranges of rows.

With only one range you could, of course, use a Row Generator stage.

There is an easy solution using a pair of jobs, a server job calling UtilityRunJob and passing the limits and a parallel job invoked from it that uses a Row Generator stage.

Just some ideas.

I_Server_Whale · Post by **I_Server_Whale** » Mon Jan 14, 2008 9:35 pm

kumar_s wrote:Dont you have RangeLookup in Version 8??
And more over, how would following record has only 4 entry?
Code: Select all
MIN_ZIP_CODE                                      MAX_ZIP_CODE 

75101                                                 75114 
Just an typo??

Oops!! Sorry, Folks. Had a long day. Yes. It should have been 04 and not 14. I don't think range lookup will be of help.

I have corrected the previous post.

Thank you, Ray!. I will try your suggestions. What about using stage variables to that effect? Read in the two columns into two stage variables and generate records based on the logic and send them to the output link.

Just wondering. But thanks again for your valuable ideas. Will try them and get back to you.

Best Regards,
Whale.

ray.wurlod · Post by **ray.wurlod** » Mon Jan 14, 2008 11:02 pm

If you can tell me how to generate an arbitrary number of rows out of a Transformer stage for each row in, I'd be happy to know it.

There is a kluge where you use - typically - a routine to generate a long string with a line terminator between each generated "output row", and output that to a sequential file so that the line terminators turn into, umm..., line terminators, but that's just a kluge and you would still need a separate job to read the sequential file (since "blocking" - passive stage with input and output links - is expressly forbidden in parallel jobs.

kumar_s · Post by **kumar_s** » Tue Jan 15, 2008 12:30 pm

This is what been refered.

ATBolton · Post by **ATBolton** » Tue Jan 15, 2008 1:57 pm

I had a similar issue few weeks ago. Here is how I got around the problem.

1. Take the source file in to a transform. In the transform add a field call dummy key, place a constant value in it.

2. Use the row generator with a counter say 1 to 100. The output of the stage will have the counter plus a dummy key field with the same value as in step one.

3. Take the output of both 1 and 2 and join them together in a join stage. Each source file record will now have 100 records.

4. Send the output of the join stage into another transform to removed all the extra fields (dummy key, counter) and possible remove any records you don't want in the final output.

ATBolton · Post by **ATBolton** » Tue Jan 15, 2008 1:59 pm

I had a similar issue few weeks ago. Here is how I got around the problem.

1. Take the source file in to a transform. In the transform add a field call dummy key, place a constant value in it.

2. Use the row generator with a counter say 1 to 100. The output of the stage will have the counter plus a dummy key field with the same value as in step one.

3. Take the output of both 1 and 2 and join them together in a join stage. Each source file record will now have 100 records.

4. Send the output of the join stage into another transform to removed all the extra fields (dummy key, counter) and possible remove any records you don't want in the final output.

ray.wurlod · Post by **ray.wurlod** » Tue Jan 15, 2008 3:04 pm

That technique would work, but it's wasteful, and would require a priori knowledge of the maximum number of rows that might need to be generated from any one range (in your case 100 was sufficient - or was it ?!! How would you know (in all cases)?).

I_Server_Whale · Post by **I_Server_Whale** » Tue Jan 15, 2008 10:53 pm

ray.wurlod wrote:That technique would work, but it's wasteful, and would require a priori knowledge of the maximum number of rows that might need to be generated from any one range (in your case 100 was sufficient - or was it ?!! How would you know (in all cases)?).

Exactly! ATBolton's solution wouldn't work as Ray pointed out. Kumar's solution points to Server solution which I'm trying to avoid. I'm on my way to building a build-op (one of Ray's suggestion).

Will let you know how it goes!

Thanks for all the help rendered! Appreciate it!
Whale.

santosh · Post by **santosh** » Wed Jan 16, 2008 6:40 am

Not an excellent way of doing it but would like to suggest an option below. else i thibnk you have to go for custom stage.

1> Create a file in O/S (UNIX) to be used as lookup with layout as below.

count, records
1,1
2,1
2,2
3,1
3,2
3,3
and so on

2> Use this file as multiple lookup to generate your output. i.e.
a> In your original file , generate a new column call record_count = max_zip_code - min_zip_code
b> Do multiple lookup of record_count(source) with count(step 1 lookupfile)

I_Server_Whale · Post by **I_Server_Whale** » Wed Jan 16, 2008 8:39 pm

santosh wrote:Not an excellent way of doing it but would like to suggest an option below.

Santosh,

Can you explain why a "Build-Op" is not an excellent way of doing it compared to your round about method?

Thanks,
Whale.

ray.wurlod · Post by **ray.wurlod** » Thu Jan 17, 2008 5:28 am

I read that as Santosh admitting that Santosh's way is not an excellent way, merely offered as an alternative.