Hashed Lookups and Data Availability

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
mcolen
Premium Member
Premium Member
Posts: 31
Joined: Wed Aug 11, 2004 8:59 am
Location: Florida

Hashed Lookups and Data Availability

Post by mcolen »

If I do a lookup to a hashed file and my output order is such that I write to the same hashed file before using the data looked up is the original data from the lookup still available to me or do I have to either save it to a staged variable or change the re-write of the hashed file to occur after I have used the data.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You never know when the data is physically written to the hashed file if you're using read/write caching, row buffering, inter-process, or IPC. If you need to be able to reference the original row over and over, even though you're overwriting it in an output link, you need to NOT overwrite the row.

If you use READ caching to page a copy of the hashed file into memory, you may see that the original row is in the read cache and the overwritten row is never "seen" on subsequent references. This is only because the hashed file has paged entirely into memory. If the file is large, you could see some updated rows because of the paging mechanisms swapping newer data in.

Consider using two copies of your reference hashed file, one for referencing and the other for updating.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Or, if you use the one hashed file, disable read cache and write cache, and enable lock for update when reading.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Doesn't the lock for update require you to actually write the row in order to release that record lock? Could you conceivable overflow the lock table and have your job come to a virtual stop if you fail to overwrite the row every time? I don't know of anyway to release a lock from the read if you elect to not write the row. In addition, if you reference lookup a row that doesn't exist, couldn't you also fill up the lock table full of useless locks without writing out that row?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes. Yes. Been there, done that. :oops: *

I typically don't bother locking for update, don't really see the point, so just leave it 'un-cached'. Be willing to learn the errors of my ways if there's a good reason for the option...

ps. *Entire engine ground to a halt, not just my pitiful little job.
Last edited by chulett on Thu Jan 18, 2007 5:49 pm, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Well, you could use a RELEASE hashedfilename,recordID to release the record lock. Or just fire off RELEASE and let everything go. :shock: I don't think there's a good way to manage that whole locking business.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Lock for update only sets a lock if the lookup fails (NOTFOUND is true). Yes, it does expect you to update the hashed file - isn't that the topic of this thread? When the job finishes any non-released locks will be released. (I suspect, but have not proven, that the lock is released when processing of that row is completed.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

In releases 6 and prior the locks accumulated during the runtime of the job. As for the topic, the topic was whether the hashed file reflects updates over the duration of a run via reference lookup or does it "remember" the initial image of the hashed file at startup.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:(I suspect, but have not proven, that the lock is released when processing of that row is completed.)
Not in my experience. I was trouble-shooting another person's job and ran an unusually high number of records through it. It wreaked all kinds of havoc and every other DataStage process on the server either errored out or could not be started because of an overflow of the lock table. It was not even resolved when the job cratered - I had to manually clear things up via DS.TOOLS from what I recall. Not fun. And this was a 7.x job... don't recall for certain, but fairly sure it was on 7.5.1A.

The issue was the improper use of the 'Enabled, Locked for Update' option. The hashed file was being used in one transformer, but the updating of the 'missing' records in the hashed file wasn't happening until the next transformer.

I could see locking updates if perhaps the hashed file were being shared across multiple processes, but in a singleton job run - what's the point? What does it buy you? I stick with updating Disabled cache reference hashed files and don't have an issue.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply