Lock For Update For HashFile Stage - Am I missing something?

fridge · Post by **fridge** » Wed May 19, 2004 1:43 pm

The answer is probably yes, but I will ask the question anyway

I have a job that has a stream link into Transformer1 that does a lookup against a HashFileStage (HFstg1), if a record is not found I pass the data out of Transformer1 into Transformer2 to do some jiggery-pokery and insert the record into HFstg2. Its a bit more complicated than that (at least thats what I tell my boss

) but that is the pertinant logic.

All works well if the inital stream link has discrete key values for the hash file, however if it has (say) two records with the same key (e.g. custno) - it seems to not find them in the HFstg1 stage(presuming its not there initially) and pass both down to Transformer2 and add them to HFStg2, this causes a destuctive update as the 'jiggery pokery' is assigning surrogate keys.

So the steps are
1) Record one - custno = 12345
2) Not in HFStg1 send to Transformer2 , add surr key , 1
3) Insert 12345,1 into HFstg2
4) Record two - custno = 12345
5) Not in HFStg1 send to Transformer2 , add surr key , 2
6) Update HFstg2 to create 12345,2 into HFstg2

Obvious things first
a) although I am using different HF stages they all point to same hash file
b) I have Disabled, Lock for updates on HFStg1
c) I havent got write stage caching on
d) I am an amateur (though talented - not)

My understanding is that if the Disabled, Lock for updates causes a failed lookup to wait for the keyed record to be inserted which is indeed done downstream.

I have got round this for now by placing an agregator stage between the two transformers to 'uniqueify' the naturalkeys before they hit the second transformer - but this seems a bit silly.

I would be grateful for any pointers, solutions derisory remarks as this is quite a crucial piece of logic for our application.

Thanks in advance

fridge

kcbland · Post by **kcbland** » Wed May 19, 2004 1:55 pm

Make sure you haven't enabled row-buffering. You don't need to use the locking mechanisms either. As long as only one job is processing against the hash file, you don't need to lock the row because no one else is looking at the file. You don't need the aggregator stage.

fridge · Post by **fridge** » Fri May 21, 2004 9:53 am

Thanks,

Havent got row buffering on and have removed lock for updates,but Still having problems , just to clarify my understanding
If using Lock-for-Updates on the first HF , do I have to do anything special on the downstreams stages to ensure that an 'update lock' occurs as currently I am still getting the situation where the two records down the link happen in sucession and get passed downstream.

This all happens within a single local container (want to make shared when it all works), however I am using different HF stages for clarity to point to a single HF.
I was going to upload the job export to the site for people to laugh at but think this is probably bad form as it the files section is there for shining becons of development brilliance, not crud that I cant get working.

If you do have any ideas of things I should look at (new careers etc) I would be grateful

fridge