Large hashed file

kcbland · Post by **kcbland** » Fri Jul 07, 2006 10:13 pm

First, if you know the DATA.30 and OVER.30 are going into the 6-8GB size every time it runs, you should create the file manually using mkdbfile and a minimum modulus like 2000000 or something. The files may never grow, the thing to watch is the timestamp on the files. If they're changing, that means that data is being placed somewhere within the DATA file itself, without needing to go into the OVER file.

Second, we don't know if the problem is Oracle or DataStage. If you're hanging an open SELECT for hours, you could be getting into issues on the Oracle side regarding rollback and holding that result set open. Seriously consider spooling to a sequential file, or better yet, using multiple job instances with a ranging query to have multiple spooling processes dumping out the data. Then, concatentate the multiple output files together and then run into the hashed file.

ArndW · Post by **ArndW** » Sat Jul 08, 2006 3:36 am

Pedro,

setting the uvconfig value for 64 bit files to 1 means that any and all files created thereafter are going to be 64bit. This will mean that all of the hundreds of small hashed files that make up your projects and jobs are going to be use the bigger file pointers that are slower. It is generally recommended to not set this flag.

If you only have this one large file you should think about making it a partitioned file. If you can use a simple algorithm on your key to split this file into 10 or perhaps more "groups" you can do 2 things that will speed up your hashed file load and perhaps increase your lookup speed, too:

(a) you can run your hashed file load in parallel with 10 processes each writing directly to the appropriate partfile and

(b) you can avoid having one big physical dynamic file and have 10 small, manageable and more efficient dynamic files all grouped together as one virtual dynamic file.

(c) If your job process allows it, you might structure your lookups to use only the partfiles and perhaps even be able to load those single files to memory for faster throughput.

pramod_au · Post by **pramod_au** » Mon Jul 10, 2006 1:47 am

Hi

How about trying distributed hash file using data segmentation? I hope this will be another work around.

Cheers
Pramod