Sequential processing of Files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

ray.wurlod wrote:Yes.

You can put the second link into a second job.

Execute the two jobs consecutively from a job sequence.
I think we are back to the same point from where we have started i.e. the processing of the 2 files means 2 separate jobs. Is this the final verdict in this issue or can I expect some other alternative.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's a free country - you can expect whatever you like.

But you won't get any alternatives from here, at least not from the experienced developers.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

ray.wurlod wrote:It's a free country - you can expect whatever you like.

But you won't get any alternatives from here, at least not from the experienced developers.
May be I was bit impolite in my reply, i apologise for that , can we think something on the lines of data set or file set instead of sequential files.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No. Your requirement (original post) was SEQUENTIALLY.

File Sets and Data Sets are read in parallel.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

ray.wurlod wrote:No. Your requirement (original post) was SEQUENTIALLY.

File Sets and Data Sets are read in parallel.
Can I modify my design using Copy stage as follows, The File(1) will go into the aggregator stage and then into the copy stage , from Copy stage we can derive 2 outputs , 1st output--file(2) and the 2nd output goes into the aggregator for the next level of aggregation. Though it may be bit cumbersome but it may reduce the number of jobs.

Please let me know whether this is feasible or not. I was trying to draw this process and paste but the format was not ok after pasting it as I am not aware about how to use the code.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1. Click the Code button.
2. Draw your "ASCII art" picture.
3. Click the Code button again, to close the Code tags.
4. Click Preview.
5. Edit your "ASCII art" so that it looks better, then click Preview.
6. Repeat step 5 until the "ASCII art" looks like you want it to.
7. Click Submit.

Code: Select all

   ----->  Aggreator 1  ----->  Copy  ----->  Aggregator 2  ----> ????
                                 |
                                 +------>  ????
Now think about partitioning and sorting requirements for the Aggregator stages.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

ray.wurlod wrote:1. Click the Code button.
2. Draw your "ASCII art" picture.
3. Click the Code button again, to close the Code tags.
4. Click Preview.
5. Edit your "ASCII art" so that it looks better, then click Preview.
6. Repeat step 5 until the "ASCII art" looks like you want it to.
7. Click Submit.

Code: Select all

   ----->  Aggreator 1  ----->  Copy  ----->  Aggregator 2  ----> ????
                                 |
                                 +------>  ????
Now think about partitioning and sorting requirements for the Aggregator stages.
Thanx for the instructions .

I intend to modify my design by incorporating Copy stage in the job.

Code: Select all


                          
                            File (2)
                              |
                              |        
  File(1)--->Aggregator---->Copy Stage---->Aggregator---->File(3)

Though this is cumbersome , it may at least reduce the number of jobs. Please let me know whether this is feasible or not.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Anything that works is feasible.

It's not actually a bad design. But think about inserting some Sort stages ahead of each Aggregator stage - the second one can specify "don't sort (already sorted)" and thereby introduce some efficiency to the design.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

Post by Cr.Cezon »

Hi,
a possible solution could be :
File-----> Transformer-----> Transformer------> File

write an intermediate file has no sense in paralell

regards,
Cristina
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

Cr.Cezon wrote:Hi,
a possible solution could be :
File-----> Transformer-----> Transformer------> File

write an intermediate file has no sense in paralell

regards,
Cristina
The intermediate file is our output for one level of aggregation.
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

Post by Cr.Cezon »

then it could be something like:
File-----> Transformer-----> Transformer------> File
|-----> Agregator



pravin1581 wrote:
Cr.Cezon wrote:Hi,
a possible solution could be :
File-----> Transformer-----> Transformer------> File

write an intermediate file has no sense in paralell

regards,
Cristina
The intermediate file is our output for one level of aggregation.
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

Post by Cr.Cezon »

File-----> Transformer1-----> Transformer2------> File
|-----> Agregator

transformer1 has two output links:
one to transformer2 and another to agregator.

pravin1581 wrote:
Cr.Cezon wrote:Hi,
a possible solution could be :
File-----> Transformer-----> Transformer------> File

write an intermediate file has no sense in paralell

regards,
Cristina
The intermediate file is our output for one level of aggregation.
[/quote]
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Intermediate files (as I mentioned in my initial response to this thread) are not permitted in parallel jobs. They are "blocking operations" and interfere with pipeline parallelism. If you need two streams, insert a Copy stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

ray.wurlod wrote:Intermediate files (as I mentioned in my initial response to this thread) are not permitted in parallel jobs. They are "blocking operations" and interfere with pipeline parallelism. If you need two streams, insert a Copy stage.

That is i am trying to implement.
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

ray.wurlod wrote:Anything that works is feasible.

It's not actually a bad design. But think about inserting some Sort stages ahead of each Aggregator stage - the second one can specify "don't sort (already sorted)" and thereby introduce some efficiency to the design.
As per your suggestion I have incorporated sort stage to sort my data sequentially but the aggregator stage after that is rearranging the sorted data and mu output is getting unsorted.
Post Reply