Lookup to Hashfile and Writing to same Hash File

dylsing · Post by **dylsing** » Thu Jun 01, 2006 11:13 am

I have a transformer stage which uses a hashfile for lookup purposes. The contents of the hashfile is written by the transformer itself.

Everytime I have new contents, I write it to the hashfile and later will do a lookup to see if the contents already exist so I won't let these duplicates through.

To summarise, the transformer uses a lookup that the transformer itself create.

The problem is that for the first row, the transformer will want to do a lookup but since it hasn't created the hashfile, the access to the non-existant hashfile will fail.
- Is there a way I can create a empty hashfile based on my needs so that it can access the empty hashfile for the lookup check?
- Or can I avoid the the lookup for the 1st row?[/list]

Any suggestions will be greatly appreciated. Thank you.

chulett · Post by **chulett** » Thu Jun 01, 2006 11:27 am

The hashed file must be opened for writing first in order to create it before it is opened for reading in the job. Accomplish this by linking a Transformer stage to the hashed lookup. Yes, just a Transformer.

Create a bogus stage variable so the job will compile. Match the metadata to the hashed file and set all column derivations to anything, @NULL for example, it won't be used. Set the constraint in the link to @FALSE. Set the actions on the Input Link in the hashed file to whatever you need - clear, delete / create, etc.

When the job starts, the link will run and create / clear the hashed file but process no rows. Then the 'main' portion of your job will run.

Easy Peasy.

ray.wurlod · Post by **ray.wurlod** » Thu Jun 01, 2006 4:37 pm

Temporarily change the name of the "read from" hashed file to one that exists (for example VOCLIB). Validate the job, which will create the hashed file in the Hashed File stage that has the input link. Then change the name of the "read from" hashed file back to what it needs to be (the one you have just created).

chulett · Post by **chulett** » Thu Jun 01, 2006 4:44 pm

That's all well and good... and clever. However, that only solves the initial creation issue. Ongoing, I'm sure the hashed file needs to be cleared in one fashion or another each run, and hanging the transformer off the reference hashed file solves both issues.

dylsing · Post by **dylsing** » Thu Jun 01, 2006 6:57 pm

I don't under the @FALSE constraint. Will it set the link to never be processed and therefore no values will be written into the hashfile but yet the hashfile will still be created?

chulett · Post by **chulett** » Thu Jun 01, 2006 7:08 pm

Exactly. :D

dylsing · Post by **dylsing** » Thu Jun 01, 2006 7:14 pm

Fascinating, the ways to get round these issues. Thank you. :D

kumar_s · Post by **kumar_s** » Thu Jun 01, 2006 10:47 pm

I afraid, wether I havnt got the issue correctly

If the job is validated the Hashed files will be created in its respective path. Since the transformer is looking up the same Hashed file which it cerates, hope it shouldnt give up any issue during run time.
I dont have server access, so i could not test it either.

ray.wurlod · Post by **ray.wurlod** » Fri Jun 02, 2006 12:26 am

"Create" is clever enough not to create if it already exists.

kumar_s · Post by **kumar_s** » Fri Jun 02, 2006 1:06 am

Validating the job once and running should run smoothly, am I right?

sb_akarmarkar · Post by **sb_akarmarkar** » Fri Jun 02, 2006 5:08 am

Use routine UtilityHashLookup which returns file or table not found if there is no file ..... by this routine we can validate the file present or not...

Thanks,
Anupam

chulett · Post by **chulett** » Fri Jun 02, 2006 5:59 am

Totally unnecessary to use the routine like that.

While Validating can be used to create hashed files amongst other things, I can't remember the last time I actually validated anything now that we no longer have to do that to precreate them.

Yes, boys and girls, once upon a time that was The Way to create hashed files.

The point of hanging the transformer off the reference hashed file is two-fold:

1) Allow a write operation to happen first in the job, so that it will create the hashed file the first time the job run and so one doesn't have to remember to validate it in every new environment. And as noted, nothing is actually written during this phase, the hashed file is mearly opened and then closed.

2) More importantly, it allows the job to clear / reset the hashed file at the proper point in the job for each run. You can't simply set the 'clear' option on the target hashed stage as it is 'too late' then - one lookup has already been done against a non-empty reference hashed file.

Hope that helps explain the why of this.