Hash files directory

him121 · Post by **him121** » Sun Dec 26, 2004 10:35 pm

Hi..
All
i have questions about the Hash files managements..

i am working on the project where we are processing billions of data through..hash files..
now i have some 8 Hash Files which i am reading through routines....in lots of jobs.

among these 8 Hash files....size varies from 1 row to millions row.

i want suggestion.....that ..if i put these particular 8 hash files..into Ascential software application directory..
then whether it will affect the performace drastically or not...???

here i want to read these 8 hash files with SELECT command..so if i dont keep all these in Default ascential
directory then i have to do file operation and read these file..that will degrade my performace..

so i decided to keep 8 files on Ascential directory..is it good decision???
any other optimized way???

waiting for reaply

thanx
him

ray.wurlod · Post by **ray.wurlod** » Sun Dec 26, 2004 11:40 pm

It doesn't matter where your hashed files are.

If you are streaming data from them, and you have control over such things, it helps if the data from each can be delivered on a separate I/O channel from the others. That is, separate disk, separate controller, as much as possible. With SAN technology you basically have to hope that distribution over channels is optimised.

Similarly, the disk where you put you hashed files should ideally not be being used for anything else at the time when you are accessing the hashed files. The Ascential software application directory (by which I assume you mean the DSEngine directory) is potentially being used to deliver message texts and executables; it is also where the NLS database and the SQL catalog for DataStage reside. None of these should be high impact when you are running jobs.

(Disk) Space - the Final Frontier

him121 · Post by **him121** » Mon Dec 27, 2004 1:44 am

thanx ray..
for nice expaination..

himanshu

chulett · Post by **chulett** » Mon Dec 27, 2004 8:06 am

A couple of additional thoughts...

Using hash files in your project, especially ones processing billions of records, increases your chances of filling up the disk where DataStage is installed. This is a Very Bad Thing and something to be avoided. For a Windows server, I would think it would be best to keep it on a drive separate from the Server install, not just in a directory outside the project.

Search the forum for the keywords "SETFILE OVERWRITING" for the syntax to create VOC records for your pathed hash files. This will allow you to treat them as if they were in your project directory (i.e. in an 'account') without them actually having to live there. There are quite a number of posts here on the subject, so the details around how it all works shouldn't be too hard to find.

chucksmith · Post by **chucksmith** » Mon Dec 27, 2004 12:03 pm

One more consideration. Reading hash files from routines means you can not take advantage of preloading a hash file into memory. For hash files that will fit into memory where only a single row is needed in the routine, I move the read back into the job, and just pass the column as a arguments to the routine. This increases speed and readability.

In the case where more than one row is needed from the hash file, consider denormalizing the hash file at build time. This puts your design back into the category of a single read.