keymgt slow performance?

ray.wurlod · Post by **ray.wurlod** » Mon Apr 03, 2006 4:28 pm

The figure of 40310 suggested that your SDKSequences file was holding 40310 sequence names and values. That's why I asked you to count it. By default, SDKSequences can contain only 512 bytes of data (separation 1). I prefer separation 4 which is closer to the page size used to transfer data between disk and memory (2048 bytes).

The FILE.STAT report showed that you were using 1,715,712 bytes to store your 40,310 sequence names/values; this meant that the overflow had to be "daisy chained" which is very inefficient for access (and increases vulnerability to corruption).

It would appear that your system has 64BIT_FILES enabled in uvconfig; this is an inefficient setting, as it forces all hashed files, including the small ones in the Repository, to have 64-bit internal pointers. Next time maintenance is scheduled that requires DataStage to be restarted, you might consider changing this setting. (A uvregen will be required.)

rachelsu · Post by **rachelsu** » Mon Apr 03, 2006 6:13 pm

Again, thanks so much for your explanations! I learned a lot from your contributions to this thread :D

ray.wurlod wrote:It would appear that your system has 64BIT_FILES enabled in uvconfig; this is an inefficient setting, as it forces all hashed files, including the small ones in the Repository, to have 64-bit internal pointers. Next time maintenance is scheduled that requires DataStage to be restarted, you might consider changing this setting. (A uvregen will be required.)

On this part, is 64-bit the default setting from a standard installation? In your opinion, what would have triggered them to use 64-bit? As you know, I'm very new to this tool and I would need some supporting materials to persuade our DS group on changing this...

Btw, are there any manuals for Administrator commands? Are there any books on DS or worthwhile training courses you recommend?

rleishman · Post by **rleishman** » Mon Apr 03, 2006 7:04 pm

Ray once posted the URL for the UniVerse Administrator's guide here.
As for a specific DS Administrator guide - I once asked Ascential the same thing - there was nothing beyond the supplied manauls on your install CD. There's also no Ascential training course that I could track down, although it is likely that if you could track down a DataStage Education/Consulting Services company that worked in the Asia/Pacific region (I've heard of one somewhere - can't recall where

) they would tailor a course to your needs.

rachelsu · Post by **rachelsu** » Mon Apr 03, 2006 7:17 pm

Thanks lots! I'll check it out.

ray.wurlod · Post by **ray.wurlod** » Tue Apr 04, 2006 2:03 am

I suspect Ross is referring to this site which you can also access from my profile.

rachelsu · Post by **rachelsu** » Tue Apr 04, 2006 2:07 am

Got it...thanks

olgc · Post by **olgc** » Wed May 03, 2006 7:54 am

Does @INROWNUM or @OUTROWNUM work in parallel jobs? Because if more than one processes read or write concurrently from the same source or target, is @INROWNUM or @OUTROWNUM count all processes or just the process iteslf.

Thanks,

[quote="ArndW"]Rachel,

try writing a job that reads a sequential file, uses the key management routines and wrietes to a sequential file (or to a sequential file in /dev/null) to see the actual speed on your system. The speeds should not be 24 rows per second, even on an overloaded system. Most system should give results over 10K rows/sec at a minimum.

COMMON blocks are very efficient, and I am certain that balajisr's comment about avoiding them is mistaken. The COMMON in use in these routines is to avoid having to re-open the hashed file each call - which would make the routine perform very slowly indeed.

As rasi has already mentioned, using one call to get a seed value and then using the counters @INROWNUM or @OUTROWNUM to provide the increments is a very fast way of doing doing - but since the hashed file max value is not being updated it needs to be corrected after a complete run and also handled in case of an abort. If you are writing to a database, you can always do your SQL's equivalent of get max value on the key field prior to the run and use that as the seed.[/quote]

ArndW · Post by **ArndW** » Wed May 03, 2006 8:03 am

olgc - this is thread mismanagement; as it doesn't apply to this thread at all! Both will work in parallel jobs, but will return unique sequential numbers for their respective node. There are several recent threads on how to use these in parallel jobs.