Page 1 of 1

Lookup to Hashfile and Writing to same Hash File

Posted: Thu Jun 01, 2006 11:13 am
by dylsing
I have a transformer stage which uses a hashfile for lookup purposes. The contents of the hashfile is written by the transformer itself.

Everytime I have new contents, I write it to the hashfile and later will do a lookup to see if the contents already exist so I won't let these duplicates through.

To summarise, the transformer uses a lookup that the transformer itself create.

The problem is that for the first row, the transformer will want to do a lookup but since it hasn't created the hashfile, the access to the non-existant hashfile will fail.
- Is there a way I can create a empty hashfile based on my needs so that it can access the empty hashfile for the lookup check?
- Or can I avoid the the lookup for the 1st row?[/list]


Any suggestions will be greatly appreciated. Thank you.

Posted: Thu Jun 01, 2006 11:27 am
by chulett
The hashed file must be opened for writing first in order to create it before it is opened for reading in the job. Accomplish this by linking a Transformer stage to the hashed lookup. Yes, just a Transformer.

Create a bogus stage variable so the job will compile. Match the metadata to the hashed file and set all column derivations to anything, @NULL for example, it won't be used. Set the constraint in the link to @FALSE. Set the actions on the Input Link in the hashed file to whatever you need - clear, delete / create, etc.

When the job starts, the link will run and create / clear the hashed file but process no rows. Then the 'main' portion of your job will run.

Easy Peasy. :wink:

Posted: Thu Jun 01, 2006 4:37 pm
by ray.wurlod
Temporarily change the name of the "read from" hashed file to one that exists (for example VOCLIB). Validate the job, which will create the hashed file in the Hashed File stage that has the input link. Then change the name of the "read from" hashed file back to what it needs to be (the one you have just created).

Posted: Thu Jun 01, 2006 4:44 pm
by chulett
That's all well and good... and clever. However, that only solves the initial creation issue. Ongoing, I'm sure the hashed file needs to be cleared in one fashion or another each run, and hanging the transformer off the reference hashed file solves both issues.

Posted: Thu Jun 01, 2006 6:57 pm
by dylsing
I don't under the @FALSE constraint. Will it set the link to never be processed and therefore no values will be written into the hashfile but yet the hashfile will still be created?

Posted: Thu Jun 01, 2006 7:08 pm
by chulett
Exactly. :D

Posted: Thu Jun 01, 2006 7:14 pm
by dylsing
Fascinating, the ways to get round these issues. Thank you. :D

Posted: Thu Jun 01, 2006 10:47 pm
by kumar_s
I afraid, wether I havnt got the issue correctly :oops:
If the job is validated the Hashed files will be created in its respective path. Since the transformer is looking up the same Hashed file which it cerates, hope it shouldnt give up any issue during run time.
I dont have server access, so i could not test it either. :?

Posted: Fri Jun 02, 2006 12:26 am
by ray.wurlod
"Create" is clever enough not to create if it already exists.

Posted: Fri Jun 02, 2006 1:06 am
by kumar_s
Validating the job once and running should run smoothly, am I right?

Posted: Fri Jun 02, 2006 5:08 am
by sb_akarmarkar
Use routine UtilityHashLookup which returns file or table not found if there is no file ..... by this routine we can validate the file present or not...

Thanks,
Anupam

Posted: Fri Jun 02, 2006 5:59 am
by chulett
Totally unnecessary to use the routine like that. :?

While Validating can be used to create hashed files amongst other things, I can't remember the last time I actually validated anything now that we no longer have to do that to precreate them. :lol: Yes, boys and girls, once upon a time that was The Way to create hashed files.

The point of hanging the transformer off the reference hashed file is two-fold:

1) Allow a write operation to happen first in the job, so that it will create the hashed file the first time the job run and so one doesn't have to remember to validate it in every new environment. And as noted, nothing is actually written during this phase, the hashed file is mearly opened and then closed.

2) More importantly, it allows the job to clear / reset the hashed file at the proper point in the job for each run. You can't simply set the 'clear' option on the target hashed stage as it is 'too late' then - one lookup has already been done against a non-empty reference hashed file.

Hope that helps explain the why of this.