How to speed up multiple lookups in a Databasic routine

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
CharlesNagy
Premium Member
Premium Member
Posts: 66
Joined: Mon Feb 21, 2005 10:40 am
Location: Paris

How to speed up multiple lookups in a Databasic routine

Post by CharlesNagy »

I am trying to speed up a routine called from a transformer, which, based on the arguments fed to it, reads from various lookup hashed files to return the values required for each row.

Each time the routine is called it may read from several hashed files in order to gather the information.

Commons are used to ensure that a hashed file which has been opened previously is not opened again, and if a required value exists, the lookup is not done.

The routine is still painfully slow, and I suspect that it is due to the sheer amount of disk access. Is there a way to allocate increased disk cache within a databasic routine? I have tried playing around with the job performance tab, but it hasn't helped much.

Not much point in posting the routine as it calls up to 20 different functions, which would be unwieldy to post. Each function opens one to several hashed files (if they are not already open) and reads from them.

I thought about amalgamating them all into one routine, but I'm not sure it would help, since I suspect that it is the sheer amount of reads that is slowing it down.

Any advice gratefully accepted
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

It probably is the reads. I've sometimes included my own cache in programs of this type by creating a Key and a Data dynamic variable for each file. The keys & data are appended as new values to these strings, and I check up on the Key (using LOCATE) before trying to do an actual read. In many cases this can have marked improvement in throughput, but it does depend on your data - if the same record is never read again then you can get a marked slowdown using this method.
CharlesNagy
Premium Member
Premium Member
Posts: 66
Joined: Mon Feb 21, 2005 10:40 am
Location: Paris

Post by CharlesNagy »

ArndW wrote:It probably is the reads. I've sometimes included my own cache in programs of this type by creating a Key and a Data dynamic variable for each file. The keys & data are appended as new values to these strings, and I check up on the Key (using LOCATE) before trying to do an actual read. In many cases this can have marked improvement in throughput, but it does depend on your data - if the same record is never read again then you can get a marked slowdown using this method.
Thanks for the idea. I created a dynamic array for a file, for values that had already been read in, but as the lookup file has 50,000 records, the dynamic array has started to get quite large, so it starts off quite fast but slows down as it progresses.

I don't think there is an easy way around it without analysing, and redisigning the process.....
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Of course the DataStage hashed file caching works quite well - but I'm sure you know that and have reasons for not being able to use this.

Depending upon how important this is to you, you can enable memory file caching for DataStage at the system level and use that; this would mean that your Hashed reads would never really go to disk. I have used this successfully before and the performance improvements are very high.

There is a document regarding memory hashed files, I can't recall the name right now but you can find it in the documentation directory.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

dsdskche.pdf
-craig

"You can never have too many knives" -- Logan Nine Fingers
CharlesNagy
Premium Member
Premium Member
Posts: 66
Joined: Mon Feb 21, 2005 10:40 am
Location: Paris

Post by CharlesNagy »

ArndW wrote:Of course the DataStage hashed file caching works quite well - but I'm sure you know that and have reasons for not being able to use this.

Depending upon how important this is to you, you can enable memory file caching for DataStage at the system level and use that; this would mean that your Hashed reads would never really go to disk. I have used this successfully before and the performance improvements are very high.

There is a document regarding memory hashed files, I can't recall the name right now but you can find it in the documentation directory.
Thanks, I am looking at the document 'Hash Stage Disk Caching' now, if that is the one you meant. I will check & see if this is a possibility....
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

That's the document. I can't recall the exact numbers, but the maximum amount of memory useable for the cache is initially small but can be expanded. There are a couple of "gotchas" with caching and I've managed to get the DS instance so confused that not a single r/w operation worked until we rebooted, but that was due to me playing around on files I shouldn't have (I think I was trying to get the director to respond faster and shoved all of the DS_... files into the cache while 30 developers were trying to do their job in the background. I made no friends that day...
CharlesNagy
Premium Member
Premium Member
Posts: 66
Joined: Mon Feb 21, 2005 10:40 am
Location: Paris

Post by CharlesNagy »

ArndW wrote:That's the document. I can't recall the exact numbers, but the maximum amount of memory useable for the cache is initially small but can be expanded. There are a couple of "gotchas" with caching and I've managed to get the DS instance so confused that not a single r/w operation worked until we rebooted, but that was due to me playing around on files I shouldn't have (I think I was trying to get the director to respond faster and shoved all of the DS_... files into the cache while 30 developers were trying to do their job in the background. I made no friends that day...
Thanks guys, much appreciated....
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please note that the cache referred to in this document is not accessible from DataStage routines.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

D'oh! :cry:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The public disk caching is a DataStage engine level setting and can be enabled per hashed file and is independant of which methods are used to read/write to the files. So it will work from BASIC code just as it does in jobs.

I just took a quick check and it seems that the "dsdskche.pdf" document is no longer delivered at V8, but I checked and the UV commands are still present so it can still be used.

Oh, I just searched the docs and found that the disk caching information has been added at Chapter 25 in the Server Job Developer Guide.
CharlesNagy
Premium Member
Premium Member
Posts: 66
Joined: Mon Feb 21, 2005 10:40 am
Location: Paris

Post by CharlesNagy »

ArndW wrote:The public disk caching is a DataStage engine level setting and can be enabled per hashed file and is independant of which methods are used to read/write to the files. So it will work from BASIC code just as it does in jobs.

I just took a quick check and it seems that the "dsdskche.pdf" document is no longer delivered at V8, but I checked and the UV commands are still present so it can still be used.

Oh, I just searched the docs and found that the disk caching information has been added at Chapter 25 in the Server Job Developer Guide.
Thanks guys, appreciate all the input, I am now marking this resolved...
Post Reply