Creating and using Hash Files in Routines

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I do!

Seriously, though, you create hashed files using either a CREATE.FILE command or a CREATE TABLE statement or a mkdbfile command, probably executed by calling DSExecute. Use the utility Hashed File Calculator (on the DataStage client CD) to get the syntax of CREATE.FILE; the SQL statement is fairly standard.

Once it's been created, your routine must OPEN the hashed file. Rows are inserted (or overwritten) via WRITE statements, and can be accessed by READ statements. The file can be closed with a CLOSE statement, though this is not necessary; in fact, the file variable to which the file is opened should be declared to be in a named COMMON area, so that the hashed file is not re-opened every invocation of your routine.

All of these statements are in the BASIC manual, which you will find among your DataStage documentation. You may also find it useful to research the statements SELECT and READNEXT.
msigal
Participant
Posts: 31
Joined: Tue Nov 26, 2002 3:19 pm
Location: Denver Metro

Post by msigal »

We use a few routines that read and write to hash files and I'm concerned that we aren't doing what Ray is suggesting, declaring them COMMON. Here's a chunk of code from one of the routines. Would it be wise to declare the hash file up front in the job control code or in the routine, which could be called multiple times? It would be great if we can make a performance improvement to these routines. Ray, as always your knowledge is greatly appreciated.

Myles

* If it does not exist, then create the hash file with NAME = HashLog_

Open '',HashFileName To F.Hash Else
** Call DSLogWarn(HashFileName:' does NOT exist, creating hash file', ProgramName)

Execute 'CREATE.FILE DATA ':HashFileName:' DYNAMIC WRITE.CACHE' Capturing Rslt

Open '',HashFileName To F.Hash Else Call DSLogFatal('Cannot open file ':HashFileName, ProgramName)

Call DSLogInfo(HashFileName:' opened *** ':Rslt, ProgramName)

End
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

$INCLUDE UNIVERSE.INCLUDE FILEINFO.H

COMMON /HF/F.Hash
* Variables in COMMON are automatically initialized to zero.
* We use FileInfo() to determine whether file is open.

If Not(FileInfo(F.Hash, FINFO$IS.FILEVAR))
Then

* If it does not exist, then create the hash file with NAME = HashLog_

Open '',HashFileName To F.Hash Else
** Call DSLogWarn(HashFileName:' does NOT exist, creating hash file', ProgramName)

Execute 'CREATE.FILE DATA ':HashFileName:' DYNAMIC WRITE.CACHE' Capturing Rslt

Open '',HashFileName To F.Hash Else Call DSLogFatal('Cannot open file ':HashFileName, ProgramName)

Call DSLogInfo(HashFileName:' opened *** ':Rslt, ProgramName)

End

End

-----------------------

I would, however, be a little more rigorous about error checking if I were you. There are many reasons OPEN can fail; file does not exist is merely one of these. Read about the STATUS() function in the BASIC manual for more information.
msigal
Participant
Posts: 31
Joined: Tue Nov 26, 2002 3:19 pm
Location: Denver Metro

Post by msigal »

Will I run into any contention issues if I have multiple jobs running where F.Hash is declared common in each? As you can see the creation of the hashed file is unique to the job so the hashed file name is unique.

What kind of performance improvement do you think I could expect? Some of these routines are called a few times for every record passing through the stage. Other times, the routine is only called on to report data quality issues, but there may be 30 to 100 potential routine calls.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No contention issues. Each instance runs as a separate process.
COMMON is intra-process, it is not an inter-process mechanism.

Hard to say what performance improvement you'll see; depends on many things, not least what you mean by "performance". It WILL help, but on 100 or fewer calls, even assuming 1 second per open, that's still less than 2 minutes saved overall, and the job is probably doing other things during this time.
Anjan Roy
Participant
Posts: 46
Joined: Mon Apr 12, 2004 9:51 am
Location: USA

Post by Anjan Roy »

I have a similar issue . I am reading the list of tables from a file and opening table specific files. The problem is that fileinfo does not find the file as opened all the time.

Here are the pieces

Code: Select all

      Common /GetSurrogateKey/ Initialized, SurrogateDefnFile, FileArray(100), FilesOpened(100)
Opening the files..

Code: Select all

	For I = 1 To MaxCount
			CurrentFileName = "SURR.CNTL.":FileArray(I)
			If TRIM(FileArray(I)) = TableName Then
				Open CurrentFileName To SurrogateCntlFile Else
					Message = "Unable to Open the Surrogate Key Control File : " : CurrentFileName
	      		     	Call DSLogFatal(Message, RoutineName)
		      	     	Ans = -1
	     	         		RETURN(Ans)
				End
			Call DSLogInfo("Opened File..":CurrentFileName, RoutineName)
		      End Else
				Open CurrentFileName To FilesOpened(I) Else
					Message = "Unable to Open the Surrogate Key Control File : " : CurrentFileName
	      	     		Call DSLogFatal(Message, RoutineName)
		           		Ans = -1
	     	         		RETURN(Ans)
		      	End
			Call DSLogInfo("Opened File..":CurrentFileName, RoutineName)
			End
			Next I

Code: Select all

If FileInfo(SurrogateCntlFile, FINFO$IS.FILEVAR) Then
		Message = "File is already open."
	End Else
		Message = "File Is Not Open."
	
		Open "SURR.CNTL.":TableName To SurrogateCntlFile Else
				Message = "Unable to Open the Surrogate Key Control File : " : CurrentFileName
	      	     	Call DSLogFatal(Message, RoutineName)
		           	Ans = -1
	     	         	RETURN(Ans)
	      End
	End
I am always getting the message as File is Not Open. I want to put the file variable to COMMON.. but the problem would be if multiple jobs are trying to call this routine.. it will return wrong result.

Any help...?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Each instance runs as a separate process.
COMMON is intra-process, it is not an inter-process mechanism.

Therefore, even though one process opens the file into a variable named in a COMMON area of memory, that area of memory is private to its own process. Another job will create its own, private, copy of the variables declared to be in COMMON memory, and therefore will find the file not to be open (unless, of course, it has been opened).

Let me make the following assertion.
There is NO MECHANISM in DataStage server jobs for sharing handles to open files between processes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Small correction.

It is possible to have hashed files shared between jobs. Enable this using the Enable hashed file cache sharing check box in job properties. This assumes version 7.1, and that you have enabled shared cache for DataStage hashed files.

However, the assertion in my previous post on this thread stands. Access to hashed files in shared cache is not exposed to file handles in DataStage BASIC. Shared cache hashed files are accessed only through Hashed File stage type.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply