file naming on input/output link of a hash file

Stef · Post by **Stef** » Mon Aug 16, 2004 11:26 am

I wonder why there is a filename field for each input/output link on a hash file(or any passive stage per say). Shouldn't it be the same filename when I write to it then when I read it back within the same job ?

Please open the light for me.

chulett · Post by **chulett** » Mon Aug 16, 2004 11:33 am

There can be times when you write to one hash file and read from another, all in the same job and using the same hash file stage. So, the fact that they allow that to happen, as an option, is a good thing in my mind.

However, it would be nice to have a mechanism whereby you could setup one side and "propogate" the information to the other, seeing as how that is what is typically done the vast majority of the time. Right now the process is prone to problems as certain parts can be easily forgotten.

Stef · Post by **Stef** » Mon Aug 16, 2004 11:41 am

Thanks Craig,

Then wouldn't it be a good practice to use a different stage each time a different file is used ? The file naming could appear only on the stage general tab (DAR for Ascential).

Is there any good reasons to reuse a hashed file stage for differents files other then for graphical reasons which, to my rookie opinion, leads to misunderstanding: to me each stage is a different file(or table) or file set.

Thanks

chulett · Post by **chulett** » Mon Aug 16, 2004 12:27 pm

That's one thing that new developers need to understand - each file does not require its own stage. Sometimes it can help make the job easier to understand when you do that, but you'll find that when all of the sources have something in common, they can be combined into a single stage.

For OCI, all links would need to be to the same instance. For sequential files, the same directory. For hash files, the same account or pathed directory. But since the actual table/hash/file in stored as part of the metadata of the link itself, they can all live together quite nicely.

For stages like the OCI stage that need to log in to a database, you can seriously cut down on resources consumed by using a minimal number of OCI stages. Why have nine source stages, let's say, that require nine seperate database connections when a single connection can be used?

For hash or sequential stages, this is less of an issue. Like you've said, most of the time a "shared" hash file stage will be for the same hash file, just don't make the assumption that is must be. If there's a chance for confusion when doing something like that, an Annotation is your best friend.

Take a specific example, relating to your last question. A transformer needs to do three hash lookups - do you setup three hash stages or one? Both would work, but I prefer to use one to show they are all part of the same logical unit of work being performed.