Hash files directory

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
him121
Premium Member
Premium Member
Posts: 55
Joined: Sat Aug 07, 2004 1:50 am

Hash files directory

Post by him121 »

Hi..
All
i have questions about the Hash files managements..

i am working on the project where we are processing billions of data through..hash files..
now i have some 8 Hash Files which i am reading through routines....in lots of jobs.

among these 8 Hash files....size varies from 1 row to millions row.

i want suggestion.....that ..if i put these particular 8 hash files..into Ascential software application directory..
then whether it will affect the performace drastically or not...???

here i want to read these 8 hash files with SELECT command..so if i dont keep all these in Default ascential
directory then i have to do file operation and read these file..that will degrade my performace..

so i decided to keep 8 files on Ascential directory..is it good decision???
any other optimized way???

waiting for reaply

thanx
him
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It doesn't matter where your hashed files are.

If you are streaming data from them, and you have control over such things, it helps if the data from each can be delivered on a separate I/O channel from the others. That is, separate disk, separate controller, as much as possible. With SAN technology you basically have to hope that distribution over channels is optimised.

Similarly, the disk where you put you hashed files should ideally not be being used for anything else at the time when you are accessing the hashed files. The Ascential software application directory (by which I assume you mean the DSEngine directory) is potentially being used to deliver message texts and executables; it is also where the NLS database and the SQL catalog for DataStage reside. None of these should be high impact when you are running jobs.

(Disk) Space - the Final Frontier :lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
him121
Premium Member
Premium Member
Posts: 55
Joined: Sat Aug 07, 2004 1:50 am

Post by him121 »

thanx ray..
for nice expaination..

himanshu
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

A couple of additional thoughts...

Using hash files in your project, especially ones processing billions of records, increases your chances of filling up the disk where DataStage is installed. This is a Very Bad Thing and something to be avoided. For a Windows server, I would think it would be best to keep it on a drive separate from the Server install, not just in a directory outside the project.

Search the forum for the keywords "SETFILE OVERWRITING" for the syntax to create VOC records for your pathed hash files. This will allow you to treat them as if they were in your project directory (i.e. in an 'account') without them actually having to live there. There are quite a number of posts here on the subject, so the details around how it all works shouldn't be too hard to find. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

One more consideration. Reading hash files from routines means you can not take advantage of preloading a hash file into memory. For hash files that will fit into memory where only a single row is needed in the routine, I move the read back into the job, and just pass the column as a arguments to the routine. This increases speed and readability.

In the case where more than one row is needed from the hash file, consider denormalizing the hash file at build time. This puts your design back into the category of a single read.
Post Reply