same hash file usage in multiple batch jobs

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

same hash file usage in multiple batch jobs

Post by DSRajesh »

Hi All,

Is there any limitations if i use hash file with more than 1 job same time.

The size of hash file would be more than 2 GB.

can any one please help out to know is there any limitations and possible soultions for the same.
RD
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not "limit" per se with looking up to the same hashed file from multiple jobs simultaneously, your normal resource constraints would apply just as if they were X different hashed files. This is a way to enable (don't recall the exact name) system level caching of hashed files such that one cached copy can be leveraged by multiple jobs... but setting that up is not for the faint of heart, IMHO.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

Post by DSRajesh »

chullet,

I would like to understand Is there any issues if i use same hash file with large volume of data in more number of jobs executing same time.If number of Hash files increases and volume of data increases ...what will be the possible issues occur.
RD
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted, normal resource constraints - i.e. memory, etc. DataStage won't care, your server will at some point.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's a hashed file, not a hash file.
:x
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
greggknight
Premium Member
Premium Member
Posts: 120
Joined: Thu Oct 28, 2004 4:24 pm

Post by greggknight »

A hashed file of that size is quite large, Just using it as a lookup in 1 job would require that you tune the building of that hash file for the best performance. Of course this depends on the structure of the record, num of columns and size.

Because if you have x amount of jobs reading that same slow file you will definitly have some performance issues.

So I would start with the one job and make sure it performs at its best before I have multiple jobs reading the same hashed file.
Depending on your system, how much memory you have will influence it as well. If you have enough memory to load the hashed file into memory Then it will be shared.
"Don't let the bull between you and the fence"

Thanks
Gregg J Knight

"Never Never Never Quit"
Winston Churchill
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

greggknight wrote:If you have enough memory to load the hashed file into memory Then it will be shared.
Not true, I'm afraid, if you simply mean the 'Preload file in memory' option in the stage. As I noted earlier, there are steps to leverage system caching that are noted in the Hash Stage Disk Caching technical bulletin pdf that may (or may not) still ship with the product.
-craig

"You can never have too many knives" -- Logan Nine Fingers
greggknight
Premium Member
Premium Member
Posts: 120
Joined: Thu Oct 28, 2004 4:24 pm

Post by greggknight »

Well I beleive the original question was the basic use of hashed files. So I wasn't going into any details of tuning them or utilizing cach.
If the question was how to tune a hash file and jobs usinging them I might of elaborated a little bit mor like saying.
In order to pre-load a file into cache we have to ensure that the same process used by the DataStage Transformer also pre-loads the files into cache. This can be achieved by utilizing the following procedures:
Use the ExecTCL Before Stage routine and enter the following:
COUNT FileName
FileName should be the name of the hashed file used for reference lookups. Remember to specify the correct case.
If more than one file needs to be referenced, a paragraph must be created at DataStage TCL level using the editor and the name of the paragraph entered as the ExecTCL Before Stage routine. This can be accomplished by invoking a Telnet session to the DataStage Server. From the > prompt enter the following:
ED VOC ParagraphName (Substitute a descriptive name for ParagraphName)
The Editor will output status info indicating that this is a New Record. If this is not the case type Q, to exit the editor and select a different name for your paragraph.
Type I to enter input mode
Type PA to specify that this entry is a paragraph
Type COUNT FileName1 (substitute the actual filename for FileName1)
Type COUNT FileName2 (substitute the actual filename for FileName2)
Enter as many files as are required, then press the Enter key on a blank line to return to command mode, then type FILE to file the paragraph.
"Don't let the bull between you and the fence"

Thanks
Gregg J Knight

"Never Never Never Quit"
Winston Churchill
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Note also that PA entries in VOC can be created by saving two or more commands from the Administrator client.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Incidentally, using COUNT verb probably doesn't load anything into memory, much less the DataStage cache. COUNT is typically resolved from the count of records stored in the hashed file header (unless a current write is open on the hashed file, which can be determined from the group lock table).

Loading into public shared cache is the topic of an entirely separate DataStage manual - this is not a feature of UniVerse, only of DataStage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

... and that entirely separate DataStage manual is the Technical Bulletin I noted above. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply