Hi,
Apologies if this has been raised and answered before, I have tried searching but lost for a good query line...
I have a job that has two input files, one writes to a hash file and the other uses this hash file as a reference, something like the below:
Input -> TX -> HF
. . . . . . . . . . . . . |
. . . . . . . . . . . . .\/ (apologies about the '.', only wat I could get the formatting)
Input . . . -> . . TX -> DB
Now my understanding is that the two input streams will get kicked off at the same time, hopefully that is correct, please correct me if wrong.
On this assumption, will the second stream wait for the hash file to be loaded before using it as a reference? Are there any settings you can make to the hash file to ensure the second stream waits?
My view is that the second stream will run a check against the first and it is effectively pot-luck as to whether the reference is there in time.
Can someone please confirm or enlighten me?
Thanks
Ryan
Hash file output to input
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Code: Select all
Input1 ----> T1 ----> HF
|
V
Input2 ----------------> T2 ------> Target
Therefore the lower Transformer stage (T2) cannot process the first row from Input2 until Input1 is completely processed (and the hashed file fully populated).
Therefore, in turn, your assumption is not correct, and the reference will be there in time. Yay!
(You can use the Code tags to get the format right -- use Preview until it is right, then Submit. See above.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Right, being a bright new day and with my tendency to make simple situations complex, I have a further query...ray.wurlod wrote:A passive stage (in this case the Hashed File stage) can not open its outputs until its inputs are closed.
With the example above, would the hash file (or at least the DS Engine) be clever enough to know it has an input and an output and therefore refuse to open the output until the input has given it relevant instructions?
Or does it only open its input (or output) when requested to do so (and as such, could receive an open request on output first)
If this is in the manual, please feel free to send me away to find it, just never seen it addressed and would like to ensure I have an understanding of the process.
Thanks
Basically, yes. It is 'clever enough' to understand the dependancies between the different segments of your jobs and knows it need to complete the hash file build (the 'Input') before it can make the 'Output' available as a lookup. So, the writes would complete, the stage would close the hash and then turn around and open it for reading, caching it into memory if requested.Kryt0n wrote:With the example above, would the hash file (or at least the DS Engine) be clever enough to know it has an input and an output and therefore refuse to open the output until the input has given it relevant instructions?
BTW, you can see all of this happening in the job's log. The stage start and finish operations are logged, so you can see the order it happens in and rows counts for each 'finished' stage right there.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers