Hi
We are developoing jobs which does lookup to the same tables.
We enabled 'Preload file to memory' since our lookup data is small.
Now, if I enable 'Hashed File Cache Sharing' in job properties,
and if one of the job loads the file to memory and the other job is scheduled simultaneously or sequential to the first job.
1. does the option of hashed file cache sharing enables two jobs to access that memory? i know it does for same job with multiple instances.
I believe any OS wont let two jobs access the same memory at a time.
If second job is sequential to first, then OS would release the memory and loads the hashed file again to memory
If second job is run parallel to first, then OS would make a second copy of the hashed file in memory too.
Thank you
hashed file cache sharing
Moderators: chulett, rschirm, roy
Bryan,
There are config parameters in the uvconfig file which you can change to define what level of file sharing is required. By default it is link private. i.e each link loads its own copy of the file in memory.
it could also be set to
1) link public sharing . i.e multiple streams shares the single copy.
2) system i.e. it is always loaded in the memory.
Until you change the uv config parameters and regenerate the engine universe does not share the memory accross multiple streams.
Further UNiverse is the execution environment where your jobs are run, so these memory configurations are managed by Universe.
Refer Disk Caching guide, it explains this in detail.
Dhiraj
There are config parameters in the uvconfig file which you can change to define what level of file sharing is required. By default it is link private. i.e each link loads its own copy of the file in memory.
it could also be set to
1) link public sharing . i.e multiple streams shares the single copy.
2) system i.e. it is always loaded in the memory.
Until you change the uv config parameters and regenerate the engine universe does not share the memory accross multiple streams.
Further UNiverse is the execution environment where your jobs are run, so these memory configurations are managed by Universe.
Refer Disk Caching guide, it explains this in detail.
Dhiraj
Hi guys,
I have a similar problem... I read that "Caching Guide".
I have followed all the steps described, but I cannot enable the public caching.
The system caching seem to work though...
I did the following:
-change the uvconfig , regen the config file, restart the engine.
-ticked all the checkboxes related with cache.
The result is:
- If I choose "WRITE IMMEDIATE" or "WRITE DEFERRED" options on the creation of the hash file, I get a message in the job log saying: "WRITE-DEFERRED file cache enabled, overriding link private cache"...
(This works even I am NOT ticking the "Enable hash file cache sharing" checkbox in job properties.)
- If I choose "NONE", I get the message that private cache will be used...
I also run LIST.FILE.CACHE command with different options and it looks like the file is in the cache...
Anyway, I was expecting to see a kind of "public cache used" message into the log.
Any hints?
I have a similar problem... I read that "Caching Guide".
I have followed all the steps described, but I cannot enable the public caching.
The system caching seem to work though...
I did the following:
-change the uvconfig , regen the config file, restart the engine.
-ticked all the checkboxes related with cache.
The result is:
- If I choose "WRITE IMMEDIATE" or "WRITE DEFERRED" options on the creation of the hash file, I get a message in the job log saying: "WRITE-DEFERRED file cache enabled, overriding link private cache"...
(This works even I am NOT ticking the "Enable hash file cache sharing" checkbox in job properties.)
- If I choose "NONE", I get the message that private cache will be used...
I also run LIST.FILE.CACHE command with different options and it looks like the file is in the cache...
Anyway, I was expecting to see a kind of "public cache used" message into the log.
Any hints?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
What values have you set in uvconfig for the following?
- DCWRITEDAEMON
DCBLOCKSIZE
DCMODULUS
DCMAXPCT
DCFLUSHPCT
DCCATALOGPCT
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I've only set the DCWRITEDAEMON to 10.
But I thought that I don't need to change these values in order that public caching to work.
Anyway, I think that I found the problem...
If I am making a job with two different stages looking uo the same hash file, I get the Public Cache Used message into the log...
But all my jobs are written with only one stage looking up with many links the same hashed file stage. Some of the links are looking up the same hash file , though.
In this case I still get the private cache message into the log...
The Caching Guide pdf says "The lookup file will run in more than one stream, either in multiple data streams within the same job or in partitioned sets with the DataStage Parallel Extender."
I don't have PX, so I assume that "multiple data streams within the same job" means more than one stage??
But I thought that I don't need to change these values in order that public caching to work.
Anyway, I think that I found the problem...
If I am making a job with two different stages looking uo the same hash file, I get the Public Cache Used message into the log...
But all my jobs are written with only one stage looking up with many links the same hashed file stage. Some of the links are looking up the same hash file , though.
In this case I still get the private cache message into the log...
The Caching Guide pdf says "The lookup file will run in more than one stream, either in multiple data streams within the same job or in partitioned sets with the DataStage Parallel Extender."
I don't have PX, so I assume that "multiple data streams within the same job" means more than one stage??
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Yes and no. Chapter 2 of the Parallel Job Developer's Guide will aid your understanding, even though you're using server jobs. It's about process boundaries.
1. A passive stage between two active stages is a process boundary.
2. An IPC stage is, ipso facto, a process boundary.
3. The invisible passive stage between two active stages joined by one link introduces a process boundary if row buffering is enabled.
4. Independent streams of processing in the same job run in separate processes.
5. Independent active stages in the same job run in separate processes.
1. A passive stage between two active stages is a process boundary.
2. An IPC stage is, ipso facto, a process boundary.
3. The invisible passive stage between two active stages joined by one link introduces a process boundary if row buffering is enabled.
4. Independent streams of processing in the same job run in separate processes.
Code: Select all
Passive -----> Active -----> Passive
Passive -----> Active -----> Passive
Code: Select all
+---> Active ----+
| |
Passive ---+ +---> Passive
| |
+---> Active ----+
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.