hashed file cache sharing

bryan · Post by **bryan** » Fri Oct 22, 2004 1:05 pm

Hi

We are developoing jobs which does lookup to the same tables.

We enabled 'Preload file to memory' since our lookup data is small.

Now, if I enable 'Hashed File Cache Sharing' in job properties,
and if one of the job loads the file to memory and the other job is scheduled simultaneously or sequential to the first job.

1. does the option of hashed file cache sharing enables two jobs to access that memory? i know it does for same job with multiple instances.

I believe any OS wont let two jobs access the same memory at a time.
If second job is sequential to first, then OS would release the memory and loads the hashed file again to memory
If second job is run parallel to first, then OS would make a second copy of the hashed file in memory too.

Thank you

dhiraj · Post by **dhiraj** » Fri Oct 22, 2004 2:43 pm

Bryan,

There are config parameters in the uvconfig file which you can change to define what level of file sharing is required. By default it is link private. i.e each link loads its own copy of the file in memory.
it could also be set to
1) link public sharing . i.e multiple streams shares the single copy.
2) system i.e. it is always loaded in the memory.

Until you change the uv config parameters and regenerate the engine universe does not share the memory accross multiple streams.

Further UNiverse is the execution environment where your jobs are run, so these memory configurations are managed by Universe.

Refer Disk Caching guide, it explains this in detail.

Dhiraj

adrian · Post by **adrian** » Mon Oct 25, 2004 7:14 am

Hi guys,

I have a similar problem... I read that "Caching Guide".
I have followed all the steps described, but I cannot enable the public caching.
The system caching seem to work though...
I did the following:
-change the uvconfig , regen the config file, restart the engine.
-ticked all the checkboxes related with cache.
The result is:
- If I choose "WRITE IMMEDIATE" or "WRITE DEFERRED" options on the creation of the hash file, I get a message in the job log saying: "WRITE-DEFERRED file cache enabled, overriding link private cache"...
(This works even I am NOT ticking the "Enable hash file cache sharing" checkbox in job properties.)
- If I choose "NONE", I get the message that private cache will be used...
I also run LIST.FILE.CACHE command with different options and it looks like the file is in the cache...

Anyway, I was expecting to see a kind of "public cache used" message into the log.

Any hints?

ray.wurlod · Post by **ray.wurlod** » Mon Oct 25, 2004 4:29 pm

What values have you set in uvconfig for the following?

DCWRITEDAEMON
DCBLOCKSIZE
DCMODULUS
DCMAXPCT
DCFLUSHPCT
DCCATALOGPCT

Have you locked any hashed files into the shared disk cache using CREATE.FILE or SET.MODE commands?

adrian · Post by **adrian** » Tue Oct 26, 2004 1:32 am

I've only set the DCWRITEDAEMON to 10.
But I thought that I don't need to change these values in order that public caching to work.
Anyway, I think that I found the problem...
If I am making a job with two different stages looking uo the same hash file, I get the Public Cache Used message into the log...
But all my jobs are written with only one stage looking up with many links the same hashed file stage. Some of the links are looking up the same hash file , though.
In this case I still get the private cache message into the log...
The Caching Guide pdf says "The lookup file will run in more than one stream, either in multiple data streams within the same job or in partitioned sets with the DataStage Parallel Extender."
I don't have PX, so I assume that "multiple data streams within the same job" means more than one stage??

ray.wurlod · Post by **ray.wurlod** » Tue Oct 26, 2004 1:40 am

Yes and no. Chapter 2 of the Parallel Job Developer's Guide will aid your understanding, even though you're using server jobs. It's about process boundaries.
1. A passive stage between two active stages is a process boundary.
2. An IPC stage is, ipso facto, a process boundary.
3. The invisible passive stage between two active stages joined by one link introduces a process boundary if row buffering is enabled.
4. Independent streams of processing in the same job run in separate processes.

Code: Select all

Passive  ----->  Active  ----->  Passive

Passive  ----->  Active  ----->  Passive

5. Independent active stages in the same job run in separate processes.

Code: Select all

             +--->  Active  ----+
             |                  |
Passive   ---+                  +--->  Passive
             |                  |
             +--->  Active  ----+