I have a job that has a stream link into Transformer1 that does a lookup against a HashFileStage (HFstg1), if a record is not found I pass the data out of Transformer1 into Transformer2 to do some jiggery-pokery and insert the record into HFstg2. Its a bit more complicated than that (at least thats what I tell my boss
![Smile :-)](./images/smilies/icon_smile.gif)
All works well if the inital stream link has discrete key values for the hash file, however if it has (say) two records with the same key (e.g. custno) - it seems to not find them in the HFstg1 stage(presuming its not there initially) and pass both down to Transformer2 and add them to HFStg2, this causes a destuctive update as the 'jiggery pokery' is assigning surrogate keys.
So the steps are
1) Record one - custno = 12345
2) Not in HFStg1 send to Transformer2 , add surr key , 1
3) Insert 12345,1 into HFstg2
4) Record two - custno = 12345
5) Not in HFStg1 send to Transformer2 , add surr key , 2
6) Update HFstg2 to create 12345,2 into HFstg2
Obvious things first
a) although I am using different HF stages they all point to same hash file
b) I have Disabled, Lock for updates on HFStg1
c) I havent got write stage caching on
d) I am an amateur (though talented - not)
My understanding is that if the Disabled, Lock for updates causes a failed lookup to wait for the keyed record to be inserted which is indeed done downstream.
I have got round this for now by placing an agregator stage between the two transformers to 'uniqueify' the naturalkeys before they hit the second transformer - but this seems a bit silly.
I would be grateful for any pointers, solutions derisory remarks as this is quite a crucial piece of logic for our application.
Thanks in advance
fridge