Can I get one stream to wait for another stream to finish?
Posted: Wed Sep 08, 2004 3:51 am
I'm trying to 1. Read my file
2. send one field to be processed down one stream
3. send all the fields to be procesed down another stream
4. Wait for the row to have been processed down both streams then continue.
Is this possible? It's the knowing that both streams have finished processing the row that I'm finding tricky.
I think I'd originally been thinking of something like the sequencer in a sequence- waiting for two jobs to finish before calling the next job - is there an equivalent to this?
The reason I'm doing this is I would send the natural key down one stream which would go and check if a surrogate key already exists and generate one if required. I would want my other stream to wait until it knew that the surrogate key had been found or generated before it tried to look it up.
What is the best way of doing this?
I had thought of these alternatives, either
1 - putting the key generation in a separate job. This would mean that I'd have to process my whole file doing the key generation before I started processing any records in the key lookup part.
or
2 - just have one stream - going 1st to the key generation and then going to the key lookup. - The problem with this is that the key generation is in a shared container and so I can't have all my fields defined in its links - just the natural key and surrogate key. I was thinking about creating another field on the shared container for 'mergedcolumns' and putting my whole record into one field which could be passed in and out of the shared container (putting a Merge and RowSplitter stage at either side of the call to shared container.
This would have the advantage that I wouldn't have to do the extra lookup after I know the surrgote key has been created - I could have all the fields I need coming out of the shared container.
Which is the best way to go about this?
Thanks for putting the effort into reading through this long question!
2. send one field to be processed down one stream
3. send all the fields to be procesed down another stream
4. Wait for the row to have been processed down both streams then continue.
Is this possible? It's the knowing that both streams have finished processing the row that I'm finding tricky.
I think I'd originally been thinking of something like the sequencer in a sequence- waiting for two jobs to finish before calling the next job - is there an equivalent to this?
The reason I'm doing this is I would send the natural key down one stream which would go and check if a surrogate key already exists and generate one if required. I would want my other stream to wait until it knew that the surrogate key had been found or generated before it tried to look it up.
What is the best way of doing this?
I had thought of these alternatives, either
1 - putting the key generation in a separate job. This would mean that I'd have to process my whole file doing the key generation before I started processing any records in the key lookup part.
or
2 - just have one stream - going 1st to the key generation and then going to the key lookup. - The problem with this is that the key generation is in a shared container and so I can't have all my fields defined in its links - just the natural key and surrogate key. I was thinking about creating another field on the shared container for 'mergedcolumns' and putting my whole record into one field which could be passed in and out of the shared container (putting a Merge and RowSplitter stage at either side of the call to shared container.
This would have the advantage that I wouldn't have to do the extra lookup after I know the surrgote key has been created - I could have all the fields I need coming out of the shared container.
Which is the best way to go about this?
Thanks for putting the effort into reading through this long question!