What do I need to now about sharing hash files between jobs?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
HSBCdev
Premium Member
Premium Member
Posts: 141
Joined: Tue Mar 16, 2004 8:22 am
Location: HSBC - UK and India
Contact:

What do I need to now about sharing hash files between jobs?

Post by HSBCdev »

What should I know about the implications of sharing hash files between jobs.

I plan to have many jobs running simultaneously all doing lookups and writes to the same hash file.

Thanks
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

If you have the hash-file 'preloaded to memory' option enabled in any job, it will take a copy of the hash-file content as-it-was when the job started and hence no further changes to the content in the original file will reflect or impact the processing.

Alternatively, if you enable the 'lock records' option enabled, other jobs accessing it may result in warning or failure.

It is a best practice to create hash file(s) in some pre-job job where they may be used and use it as a 'read-only' in the main job - unless the design dictates otherwise.

Also an important thing may be to check 'clear before load' option. If this is ticked on, it may result in hash-files being cleared and hence erroneous results.
HSBCdev
Premium Member
Premium Member
Posts: 141
Joined: Tue Mar 16, 2004 8:22 am
Location: HSBC - UK and India
Contact:

Post by HSBCdev »

Thanks.
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Re: What do I need to now about sharing hash files between j

Post by peternolan9 »

HSBCdev wrote:What should I know about the implications of sharing hash files between jobs.

I plan to have many jobs running simultaneously all doing lookups and writes to the same hash file.

Thanks
Hi HSBCdev,
I would be interested to see how you go.....

I believed that it was possible to share one hash file across many different jobs....but my DS guy tells me it is only possible to share hash files across many different instances of the SAME job.....that is the job property 'Enable hashed file cache sharing' means inside multiple instances of a job...

I am still having difficulty believing that one and I would like to think my DS guy has mis-read the manual....

I would be very keen to see a pointer to the documentation on how to share hash files between multiple different jobs on solaris 9 and AIX 5.1....
Best Regards
Peter Nolan
www.peternolan.com
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Refer to the DataStage Disk Caching Guide in your online documentation.
cernigls
Participant
Posts: 1
Joined: Fri Dec 10, 2004 8:18 am
Location: Boston
Contact:

Post by cernigls »

chucksmith wrote:Refer to the DataStage Disk Caching Guide in your online documentation.
Does anyone know if this documentation is available online anywhere. I don't have the documents with me.

Thanks!!
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Post by peternolan9 »

chucksmith wrote:Refer to the DataStage Disk Caching Guide in your online documentation.
Hi Chuck,
this is the document my DS guy quotes as saying that a hash file cannot be shared across different jobs only different instances of the same job...

I'm really hoping he has misread. I'm hoping to hear from someone how to make sure two different jobs doing different things with the same hash files can share them.......
Best Regards
Peter Nolan
www.peternolan.com
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Peter,

hash files are to DataStage as tables are to a SQL instance. They were designed to allow many people to access it concurrently given certain rules and constraints. Just as the thread mentioned, if you load a file to memory and it subsequently changes, you get stale data. If you have jobs that write while others may read data you might get missed reads - changing the way DS writes and caches will reduce but not eliminate potential errors. Basically nothing will go "poof" using files across jobs, you just need to know what potential conflicts might arise. The same applies to a job which reads AND updates the same file - it will work as long as you take into consideration the effects of writes & reads across processes.
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Have your DS guy reread the section on System Caching.

Code: Select all

Guidelines for Choosing a Type of Caching
Use the following as a guideline as you select the type of caching to use:

To                                                                      Use
Share between reference and output files in a
single data stream
                                                                          Link private caching
Share among multiple data streams or within a
container running with the Parallel Extender
                                                                          Link public caching
Share among multiple jobs running sequentially
or in parallel using the same reference file
and/or output file
                                                                          System caching
jmessiha
Participant
Posts: 21
Joined: Fri Nov 12, 2004 9:48 am

Post by jmessiha »

Where do you go to enable these different types of caching?
chucksmith wrote:Have your DS guy reread the section on System Caching.

Code: Select all

Guidelines for Choosing a Type of Caching
Use the following as a guideline as you select the type of caching to use:

To                                                                      Use
Share between reference and output files in a
single data stream
                                                                          Link private caching
Share among multiple data streams or within a
container running with the Parallel Extender
                                                                          Link public caching
Share among multiple jobs running sequentially
or in parallel using the same reference file
and/or output file
                                                                          System caching
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Enabling this caching mechanism involves changing values in the uvconfig file in your server engine directory, then regening the engine. Refer to the Disk Caching Guide and the Administrator's Guide.
Post Reply