same hash file usage in multiple batch jobs

DSRajesh · Post by **DSRajesh** » Wed Feb 16, 2011 9:48 am

Hi All,

Is there any limitations if i use hash file with more than 1 job same time.

The size of hash file would be more than 2 GB.

can any one please help out to know is there any limitations and possible soultions for the same.

chulett · Post by **chulett** » Wed Feb 16, 2011 9:55 am

Not "limit" per se with looking up to the same hashed file from multiple jobs simultaneously, your normal resource constraints would apply just as if they were X different hashed files. This is a way to enable (don't recall the exact name) system level caching of hashed files such that one cached copy can be leveraged by multiple jobs... but setting that up is not for the faint of heart, IMHO.

DSRajesh · Post by **DSRajesh** » Wed Feb 16, 2011 9:59 am

chullet,

I would like to understand Is there any issues if i use same hash file with large volume of data in more number of jobs executing same time.If number of Hash files increases and volume of data increases ...what will be the possible issues occur.

chulett · Post by **chulett** » Wed Feb 16, 2011 12:55 pm

As noted, normal resource constraints - i.e. memory, etc. DataStage won't care, your server will at some point.

ray.wurlod · Post by **ray.wurlod** » Wed Feb 16, 2011 2:52 pm

It's a hashed file, not a hash file.

greggknight · Post by **greggknight** » Sun Feb 20, 2011 9:21 am

A hashed file of that size is quite large, Just using it as a lookup in 1 job would require that you tune the building of that hash file for the best performance. Of course this depends on the structure of the record, num of columns and size.

Because if you have x amount of jobs reading that same slow file you will definitly have some performance issues.

So I would start with the one job and make sure it performs at its best before I have multiple jobs reading the same hashed file.
Depending on your system, how much memory you have will influence it as well. If you have enough memory to load the hashed file into memory Then it will be shared.

chulett · Post by **chulett** » Sun Feb 20, 2011 9:48 am

greggknight wrote:If you have enough memory to load the hashed file into memory Then it will be shared.

Not true, I'm afraid, if you simply mean the 'Preload file in memory' option in the stage. As I noted earlier, there are steps to leverage system caching that are noted in the Hash Stage Disk Caching technical bulletin pdf that may (or may not) still ship with the product.

greggknight · Post by **greggknight** » Sun Feb 20, 2011 10:20 am

Well I beleive the original question was the basic use of hashed files. So I wasn't going into any details of tuning them or utilizing cach.
If the question was how to tune a hash file and jobs usinging them I might of elaborated a little bit mor like saying.
In order to pre-load a file into cache we have to ensure that the same process used by the DataStage Transformer also pre-loads the files into cache. This can be achieved by utilizing the following procedures:
Use the ExecTCL Before Stage routine and enter the following:
COUNT FileName
FileName should be the name of the hashed file used for reference lookups. Remember to specify the correct case.
If more than one file needs to be referenced, a paragraph must be created at DataStage TCL level using the editor and the name of the paragraph entered as the ExecTCL Before Stage routine. This can be accomplished by invoking a Telnet session to the DataStage Server. From the > prompt enter the following:
ED VOC ParagraphName (Substitute a descriptive name for ParagraphName)
The Editor will output status info indicating that this is a New Record. If this is not the case type Q, to exit the editor and select a different name for your paragraph.
Type I to enter input mode
Type PA to specify that this entry is a paragraph
Type COUNT FileName1 (substitute the actual filename for FileName1)
Type COUNT FileName2 (substitute the actual filename for FileName2)
Enter as many files as are required, then press the Enter key on a blank line to return to command mode, then type FILE to file the paragraph.

ray.wurlod · Post by **ray.wurlod** » Sun Feb 20, 2011 3:56 pm

Note also that PA entries in VOC can be created by saving two or more commands from the Administrator client.

ray.wurlod · Post by **ray.wurlod** » Mon Feb 21, 2011 1:22 am

Incidentally, using COUNT verb probably doesn't load anything into memory, much less the DataStage cache. COUNT is typically resolved from the count of records stored in the hashed file header (unless a current write is open on the hashed file, which can be determined from the group lock table).

Loading into public shared cache is the topic of an entirely separate DataStage manual - this is not a feature of UniVerse, only of DataStage.

chulett · Post by **chulett** » Mon Feb 21, 2011 9:11 am

... and that entirely separate DataStage manual is the Technical Bulletin I noted above.