Sequential processing of Files

pravin1581 · Post by **pravin1581** » Wed Apr 11, 2007 10:40 pm

ray.wurlod wrote:Yes.

You can put the second link into a second job.

Execute the two jobs consecutively from a job sequence.

I think we are back to the same point from where we have started i.e. the processing of the 2 files means 2 separate jobs. Is this the final verdict in this issue or can I expect some other alternative.

ray.wurlod · Post by **ray.wurlod** » Thu Apr 12, 2007 2:26 am

It's a free country - you can expect whatever you like.

But you won't get any alternatives from here, at least not from the experienced developers.

pravin1581 · Post by **pravin1581** » Thu Apr 12, 2007 10:13 pm

ray.wurlod wrote:It's a free country - you can expect whatever you like.

But you won't get any alternatives from here, at least not from the experienced developers.

May be I was bit impolite in my reply, i apologise for that , can we think something on the lines of data set or file set instead of sequential files.

ray.wurlod · Post by **ray.wurlod** » Fri Apr 13, 2007 2:10 am

No. Your requirement (original post) was SEQUENTIALLY.

File Sets and Data Sets are read in parallel.

pravin1581 · Post by **pravin1581** » Wed Apr 18, 2007 10:57 pm

ray.wurlod wrote:No. Your requirement (original post) was SEQUENTIALLY.

File Sets and Data Sets are read in parallel.

Can I modify my design using Copy stage as follows, The File(1) will go into the aggregator stage and then into the copy stage , from Copy stage we can derive 2 outputs , 1st output--file(2) and the 2nd output goes into the aggregator for the next level of aggregation. Though it may be bit cumbersome but it may reduce the number of jobs.

Please let me know whether this is feasible or not. I was trying to draw this process and paste but the format was not ok after pasting it as I am not aware about how to use the code.

ray.wurlod · Post by **ray.wurlod** » Wed Apr 18, 2007 11:24 pm

1. Click the Code button.
2. Draw your "ASCII art" picture.
3. Click the Code button again, to close the Code tags.
4. Click Preview.
5. Edit your "ASCII art" so that it looks better, then click Preview.
6. Repeat step 5 until the "ASCII art" looks like you want it to.
7. Click Submit.

Code: Select all

   ----->  Aggreator 1  ----->  Copy  ----->  Aggregator 2  ----> ????
                                 |
                                 +------>  ????

Now think about partitioning and sorting requirements for the Aggregator stages.

pravin1581 · Post by **pravin1581** » Thu Apr 19, 2007 1:25 am

ray.wurlod wrote:1. Click the Code button.
2. Draw your "ASCII art" picture.
3. Click the Code button again, to close the Code tags.
4. Click Preview.
5. Edit your "ASCII art" so that it looks better, then click Preview.
6. Repeat step 5 until the "ASCII art" looks like you want it to.
7. Click Submit.
Code: Select all
   ----->  Aggreator 1  ----->  Copy  ----->  Aggregator 2  ----> ????
                                 |
                                 +------>  ????
Now think about partitioning and sorting requirements for the Aggregator stages.

Thanx for the instructions .

I intend to modify my design by incorporating Copy stage in the job.

Code: Select all


                          
                            File (2)
                              |
                              |        
  File(1)--->Aggregator---->Copy Stage---->Aggregator---->File(3)

Though this is cumbersome , it may at least reduce the number of jobs. Please let me know whether this is feasible or not.

ray.wurlod · Post by **ray.wurlod** » Thu Apr 19, 2007 5:03 am

Anything that works is feasible.

It's not actually a bad design. But think about inserting some Sort stages ahead of each Aggregator stage - the second one can specify "don't sort (already sorted)" and thereby introduce some efficiency to the design.

Cr.Cezon · Post by **Cr.Cezon** » Thu Apr 19, 2007 8:48 am

Hi,
a possible solution could be :
File-----> Transformer-----> Transformer------> File

write an intermediate file has no sense in paralell

regards,
Cristina

pravin1581 · Post by **pravin1581** » Thu Apr 19, 2007 8:52 am

Cr.Cezon wrote:Hi,
a possible solution could be :
File-----> Transformer-----> Transformer------> File

write an intermediate file has no sense in paralell

regards,
Cristina

The intermediate file is our output for one level of aggregation.

Cr.Cezon · Post by **Cr.Cezon** » Thu Apr 19, 2007 9:55 am

then it could be something like:
File-----> Transformer-----> Transformer------> File
|-----> Agregator

pravin1581 wrote:
Cr.Cezon wrote:Hi,
a possible solution could be :
File-----> Transformer-----> Transformer------> File

write an intermediate file has no sense in paralell

regards,
Cristina
The intermediate file is our output for one level of aggregation.

Cr.Cezon · Post by **Cr.Cezon** » Thu Apr 19, 2007 10:00 am

File-----> Transformer1-----> Transformer2------> File
|-----> Agregator

transformer1 has two output links:
one to transformer2 and another to agregator.

pravin1581 wrote:
Cr.Cezon wrote:Hi,
a possible solution could be :
File-----> Transformer-----> Transformer------> File

write an intermediate file has no sense in paralell

regards,
Cristina
The intermediate file is our output for one level of aggregation.

[/quote]

ray.wurlod · Post by **ray.wurlod** » Thu Apr 19, 2007 3:06 pm

Intermediate files (as I mentioned in my initial response to this thread) are not permitted in parallel jobs. They are "blocking operations" and interfere with pipeline parallelism. If you need two streams, insert a Copy stage.

pravin1581 · Post by **pravin1581** » Thu Apr 19, 2007 10:40 pm

ray.wurlod wrote:Intermediate files (as I mentioned in my initial response to this thread) are not permitted in parallel jobs. They are "blocking operations" and interfere with pipeline parallelism. If you need two streams, insert a Copy stage.

That is i am trying to implement.

pravin1581 · Post by **pravin1581** » Sat Apr 21, 2007 1:37 am

ray.wurlod wrote:Anything that works is feasible.

It's not actually a bad design. But think about inserting some Sort stages ahead of each Aggregator stage - the second one can specify "don't sort (already sorted)" and thereby introduce some efficiency to the design.

As per your suggestion I have incorporated sort stage to sort my data sequentially but the aggregator stage after that is rearranging the sorted data and mu output is getting unsorted.