Do not pre-load to memory. You want to be able to write to the hashed file as the job runs. Likewise to not enable write cache. You want to be able to ensure that a record written is immediately available when the next row is processed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod wrote:Make sure that caching is disabled in both stages, and that the writing Hashed File stage is set to "lock for update".
That is a good suggestion. I also think that the problem is with the data or with the fetching of the data. He seems to be having multiple rows with the same Hashing key, which will get overwritten in the Hashed file.
OP - Why cannot you do a direct lookup to the database?
Vivek Gadwal
Experience is what you get when you didn't get what you wanted
They could as long as they are willing to commit every record as it is written to the target database and take the performance hit on the lookup as well. Much better to stick with the hashed file lookup and set it up properly, it really is simple - especially after you've done it once.
-craig
"You can never have too many knives" -- Logan Nine Fingers
chulett wrote:They could as long as they are willing to commit every record as it is written to the target database and take the performance hit on the lookup as well. Much better to stick with the hashed file lookup and set it up properly, it really is simple - especially after you've done it once.
True. Hashed file lookups are the way to go.
However, the OP's job design shows that he is capturing the updates in a Sequential file. That means, he is not too worried about the updates happening immediately. Hence my suggestion to go directly to the database source.
Also, another contributing factor for my suggestion is w.r.t the problem he is having. He is overwriting rows because of the same key as he is placing them in a Hashed file. As far as I know, there is no way to override this in a Hashed file.
Vivek Gadwal
Experience is what you get when you didn't get what you wanted
It's hard to tell exactly what is going on with the OP's job as what is posted doesn't make much sense, at least to me. However, in general, it is quite straight-forward to do Type2 SCDs in a Server job and the proper way to do that is what Ray posted - the details first get populated into a reference hashed file, not just keys but all appropriate elements. You either don't cache the lookup or if you do, you make sure you have it 'Locked for updates'. If you get a hashed file hit and determine (in the Transformer via stage variables) that nothing has changed, you do nothing. Hashed file miss? Insert. Hashed file hit and change? Same Insert plus an Update. In either case you write the entire set of new data back to the reference hashed file so it is available for the next input row.
Yes, there is no way to stop a hashed file write from doing a destructive overwrite for a given set of key fields, that's just how they work. However, in this case we want it to do that so we always have a record of the most current active record in the target.
-craig
"You can never have too many knives" -- Logan Nine Fingers
chulett wrote:
Yes, there is no way to stop a hashed file write from doing a destructive overwrite for a given set of key fields, that's just how they work. However, in this case we want it to do that so we always have a record of the most current active record in the target.
@OP - Craig has given you a very detailed explanation of how to do this within the Server job. It doesn't get any better than this. You could accomplish the whole thing within a job. It is not very clear why you are capturing only updates within a Sequential file stage and mentioned that you are implementing SCD2 within this job. If it is something else you are trying to implement, have a more detailed post on it so that we can help accordingly.
Vivek Gadwal
Experience is what you get when you didn't get what you wanted