hashed file cache sharing

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bryan
Participant
Posts: 91
Joined: Sat Feb 21, 2004 1:17 pm

hashed file cache sharing

Post by bryan »

Hi

We are developoing jobs which does lookup to the same tables.

We enabled 'Preload file to memory' since our lookup data is small.

Now, if I enable 'Hashed File Cache Sharing' in job properties,
and if one of the job loads the file to memory and the other job is scheduled simultaneously or sequential to the first job.

1. does the option of hashed file cache sharing enables two jobs to access that memory? i know it does for same job with multiple instances.


I believe any OS wont let two jobs access the same memory at a time.
If second job is sequential to first, then OS would release the memory and loads the hashed file again to memory
If second job is run parallel to first, then OS would make a second copy of the hashed file in memory too.


Thank you
dhiraj
Participant
Posts: 68
Joined: Sat Dec 06, 2003 7:03 am

Post by dhiraj »

Bryan,

There are config parameters in the uvconfig file which you can change to define what level of file sharing is required. By default it is link private. i.e each link loads its own copy of the file in memory.
it could also be set to
1) link public sharing . i.e multiple streams shares the single copy.
2) system i.e. it is always loaded in the memory.


Until you change the uv config parameters and regenerate the engine universe does not share the memory accross multiple streams.


Further UNiverse is the execution environment where your jobs are run, so these memory configurations are managed by Universe.

Refer Disk Caching guide, it explains this in detail.


Dhiraj
adrian
Participant
Posts: 10
Joined: Wed Jul 14, 2004 1:59 am
Location: Bucharest, Romania

Post by adrian »

Hi guys,

I have a similar problem... I read that "Caching Guide".
I have followed all the steps described, but I cannot enable the public caching.
The system caching seem to work though...
I did the following:
-change the uvconfig , regen the config file, restart the engine.
-ticked all the checkboxes related with cache.
The result is:
- If I choose "WRITE IMMEDIATE" or "WRITE DEFERRED" options on the creation of the hash file, I get a message in the job log saying: "WRITE-DEFERRED file cache enabled, overriding link private cache"...
(This works even I am NOT ticking the "Enable hash file cache sharing" checkbox in job properties.)
- If I choose "NONE", I get the message that private cache will be used...
I also run LIST.FILE.CACHE command with different options and it looks like the file is in the cache...

Anyway, I was expecting to see a kind of "public cache used" message into the log.

Any hints?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What values have you set in uvconfig for the following?
  • DCWRITEDAEMON
    DCBLOCKSIZE
    DCMODULUS
    DCMAXPCT
    DCFLUSHPCT
    DCCATALOGPCT
Have you locked any hashed files into the shared disk cache using CREATE.FILE or SET.MODE commands?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
adrian
Participant
Posts: 10
Joined: Wed Jul 14, 2004 1:59 am
Location: Bucharest, Romania

Post by adrian »

I've only set the DCWRITEDAEMON to 10.
But I thought that I don't need to change these values in order that public caching to work.
Anyway, I think that I found the problem...
If I am making a job with two different stages looking uo the same hash file, I get the Public Cache Used message into the log...
But all my jobs are written with only one stage looking up with many links the same hashed file stage. Some of the links are looking up the same hash file , though.
In this case I still get the private cache message into the log...
The Caching Guide pdf says "The lookup file will run in more than one stream, either in multiple data streams within the same job or in partitioned sets with the DataStage Parallel Extender."
I don't have PX, so I assume that "multiple data streams within the same job" means more than one stage??
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes and no. Chapter 2 of the Parallel Job Developer's Guide will aid your understanding, even though you're using server jobs. It's about process boundaries.
1. A passive stage between two active stages is a process boundary.
2. An IPC stage is, ipso facto, a process boundary.
3. The invisible passive stage between two active stages joined by one link introduces a process boundary if row buffering is enabled.
4. Independent streams of processing in the same job run in separate processes.

Code: Select all

Passive  ----->  Active  ----->  Passive

Passive  ----->  Active  ----->  Passive
5. Independent active stages in the same job run in separate processes.

Code: Select all

             +--->  Active  ----+
             |                  |
Passive   ---+                  +--->  Passive
             |                  |
             +--->  Active  ----+
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply