What do I need to now about sharing hash files between jobs?
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 141
- Joined: Tue Mar 16, 2004 8:22 am
- Location: HSBC - UK and India
- Contact:
What do I need to now about sharing hash files between jobs?
What should I know about the implications of sharing hash files between jobs.
I plan to have many jobs running simultaneously all doing lookups and writes to the same hash file.
Thanks
I plan to have many jobs running simultaneously all doing lookups and writes to the same hash file.
Thanks
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
If you have the hash-file 'preloaded to memory' option enabled in any job, it will take a copy of the hash-file content as-it-was when the job started and hence no further changes to the content in the original file will reflect or impact the processing.
Alternatively, if you enable the 'lock records' option enabled, other jobs accessing it may result in warning or failure.
It is a best practice to create hash file(s) in some pre-job job where they may be used and use it as a 'read-only' in the main job - unless the design dictates otherwise.
Also an important thing may be to check 'clear before load' option. If this is ticked on, it may result in hash-files being cleared and hence erroneous results.
Alternatively, if you enable the 'lock records' option enabled, other jobs accessing it may result in warning or failure.
It is a best practice to create hash file(s) in some pre-job job where they may be used and use it as a 'read-only' in the main job - unless the design dictates otherwise.
Also an important thing may be to check 'clear before load' option. If this is ticked on, it may result in hash-files being cleared and hence erroneous results.
-
- Participant
- Posts: 214
- Joined: Mon Feb 23, 2004 2:10 am
- Location: Dublin, Ireland
- Contact:
Re: What do I need to now about sharing hash files between j
Hi HSBCdev,HSBCdev wrote:What should I know about the implications of sharing hash files between jobs.
I plan to have many jobs running simultaneously all doing lookups and writes to the same hash file.
Thanks
I would be interested to see how you go.....
I believed that it was possible to share one hash file across many different jobs....but my DS guy tells me it is only possible to share hash files across many different instances of the SAME job.....that is the job property 'Enable hashed file cache sharing' means inside multiple instances of a job...
I am still having difficulty believing that one and I would like to think my DS guy has mis-read the manual....
I would be very keen to see a pointer to the documentation on how to share hash files between multiple different jobs on solaris 9 and AIX 5.1....
-
- Premium Member
- Posts: 385
- Joined: Wed Jun 16, 2004 12:43 pm
- Location: Virginia, USA
- Contact:
Refer to the DataStage Disk Caching Guide in your online documentation.
Chuck Smith
www.anotheritco.com
www.anotheritco.com
-
- Participant
- Posts: 214
- Joined: Mon Feb 23, 2004 2:10 am
- Location: Dublin, Ireland
- Contact:
Hi Chuck,chucksmith wrote:Refer to the DataStage Disk Caching Guide in your online documentation.
this is the document my DS guy quotes as saying that a hash file cannot be shared across different jobs only different instances of the same job...
I'm really hoping he has misread. I'm hoping to hear from someone how to make sure two different jobs doing different things with the same hash files can share them.......
Peter,
hash files are to DataStage as tables are to a SQL instance. They were designed to allow many people to access it concurrently given certain rules and constraints. Just as the thread mentioned, if you load a file to memory and it subsequently changes, you get stale data. If you have jobs that write while others may read data you might get missed reads - changing the way DS writes and caches will reduce but not eliminate potential errors. Basically nothing will go "poof" using files across jobs, you just need to know what potential conflicts might arise. The same applies to a job which reads AND updates the same file - it will work as long as you take into consideration the effects of writes & reads across processes.
hash files are to DataStage as tables are to a SQL instance. They were designed to allow many people to access it concurrently given certain rules and constraints. Just as the thread mentioned, if you load a file to memory and it subsequently changes, you get stale data. If you have jobs that write while others may read data you might get missed reads - changing the way DS writes and caches will reduce but not eliminate potential errors. Basically nothing will go "poof" using files across jobs, you just need to know what potential conflicts might arise. The same applies to a job which reads AND updates the same file - it will work as long as you take into consideration the effects of writes & reads across processes.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 385
- Joined: Wed Jun 16, 2004 12:43 pm
- Location: Virginia, USA
- Contact:
Have your DS guy reread the section on System Caching.
Code: Select all
Guidelines for Choosing a Type of Caching
Use the following as a guideline as you select the type of caching to use:
To Use
Share between reference and output files in a
single data stream
Link private caching
Share among multiple data streams or within a
container running with the Parallel Extender
Link public caching
Share among multiple jobs running sequentially
or in parallel using the same reference file
and/or output file
System caching
Chuck Smith
www.anotheritco.com
www.anotheritco.com
Where do you go to enable these different types of caching?
chucksmith wrote:Have your DS guy reread the section on System Caching.
Code: Select all
Guidelines for Choosing a Type of Caching Use the following as a guideline as you select the type of caching to use: To Use Share between reference and output files in a single data stream Link private caching Share among multiple data streams or within a container running with the Parallel Extender Link public caching Share among multiple jobs running sequentially or in parallel using the same reference file and/or output file System caching
-
- Premium Member
- Posts: 385
- Joined: Wed Jun 16, 2004 12:43 pm
- Location: Virginia, USA
- Contact:
Enabling this caching mechanism involves changing values in the uvconfig file in your server engine directory, then regening the engine. Refer to the Disk Caching Guide and the Administrator's Guide.
Chuck Smith
www.anotheritco.com
www.anotheritco.com