Hash file output to input

Kryt0n · Post by **Kryt0n** » Tue Jul 19, 2005 10:02 pm

Hi,
Apologies if this has been raised and answered before, I have tried searching but lost for a good query line...

I have a job that has two input files, one writes to a hash file and the other uses this hash file as a reference, something like the below:

Input -> TX -> HF
. . . . . . . . . . . . . |
. . . . . . . . . . . . .\/ (apologies about the '.', only wat I could get the formatting)

Input . . . -> . . TX -> DB

Now my understanding is that the two input streams will get kicked off at the same time, hopefully that is correct, please correct me if wrong.

On this assumption, will the second stream wait for the hash file to be loaded before using it as a reference? Are there any settings you can make to the hash file to ensure the second stream waits?

My view is that the second stream will run a check against the first and it is effectively pot-luck as to whether the reference is there in time.

Can someone please confirm or enlighten me?
Thanks
Ryan

ranga1970 · Post by **ranga1970** » Tue Jul 19, 2005 10:32 pm

I am totally consfused, could you be more clear in what you want.....

ray.wurlod · Post by **ray.wurlod** » Wed Jul 20, 2005 12:42 am

Code: Select all

Input1 ---->  T1  ---->  HF
                          |
                          V
Input2  ----------------> T2  ------> Target

A passive stage (in this case the Hashed File stage) can not open its outputs until its inputs are closed.

Therefore the lower Transformer stage (T2) cannot process the first row from Input2 until Input1 is completely processed (and the hashed file fully populated).

Therefore, in turn, your assumption is not correct, and the reference will be there in time. Yay!

(You can use the Code tags to get the format right -- use Preview until it is right, then Submit. See above.)

Kryt0n · Post by **Kryt0n** » Wed Jul 20, 2005 1:24 am

Thanks for that!

I will learn how to format eventually...

Kryt0n · Post by **Kryt0n** » Wed Jul 20, 2005 6:14 pm

ray.wurlod wrote:A passive stage (in this case the Hashed File stage) can not open its outputs until its inputs are closed.

Right, being a bright new day and with my tendency to make simple situations complex, I have a further query...

With the example above, would the hash file (or at least the DS Engine) be clever enough to know it has an input and an output and therefore refuse to open the output until the input has given it relevant instructions?

Or does it only open its input (or output) when requested to do so (and as such, could receive an open request on output first)

If this is in the manual, please feel free to send me away to find it, just never seen it addressed and would like to ensure I have an understanding of the process.

Thanks

chulett · Post by **chulett** » Wed Jul 20, 2005 6:24 pm

Kryt0n wrote:With the example above, would the hash file (or at least the DS Engine) be clever enough to know it has an input and an output and therefore refuse to open the output until the input has given it relevant instructions?

Basically, yes. It is 'clever enough' to understand the dependancies between the different segments of your jobs and knows it need to complete the hash file build (the 'Input') before it can make the 'Output' available as a lookup. So, the writes would complete, the stage would close the hash and then turn around and open it for reading, caching it into memory if requested.

BTW, you can see all of this happening in the job's log. The stage start and finish operations are logged, so you can see the order it happens in and rows counts for each 'finished' stage right there.