keymgt slow performance?

rachelsu · Post by **rachelsu** » Fri Mar 31, 2006 1:17 am

Can you email me your version? Or would you like me to email you my version? Pls advise. Thanks!

ray.wurlod wrote:Has some kind soul modified your KeyMgt routine? (There was one group of consultants running around Australia and New Zealand doing that for some years - rebadged the SDK routines as their own, and managed to ruin more than a few of them in the process.)

ray.wurlod · Post by **ray.wurlod** » Fri Mar 31, 2006 4:35 pm

DataStage is not the only thing I do. Certainly I don't have a "home" copy - no-one does, as there is no such thing. I would have to visit someone who has DataStage to do that.

You are looking for a routine that opens a hashed file called SDKSequences onto a file variable declared to be in COMMON, and which increments a sequence value (for a given sequence name) each time it is invoked.

I can't send you any code within the next two weeks, as I am doing non-DataStage work. Here's one out of my head.

Code: Select all

      FUNCTION NextSequenceKey(SequenceName)

* Define this token to report whether sequence name is locked by other process.  
* Undefine to simply await lock to be released.
$DEFINE REPORT.LOCKED

$INCLUDE UNIVERSE.INCLUDE FILEINFO.H

      Equate RoutineName To "NextSequenceKey"
      Equate MAXATTEMPTS To 5

      COMMON /NextSequenceKey/SDKSequences.fvar

* Open SDKSequences hashed file if it is not already open.
      If Not(FileInfo(SDKSequences.fvar, FINFO$IS.FILEVAR))
      Then

         Open "SDKSequences" To SDKSequences.fvar
         Else

            Call DSExecute("UV", "CREATE.FILE SDKSequences 2 1 4", Output, Code)
            If Code = 0
            Then

               Open "SDKSequences" To SDKSequences.fvar
               Else
                  Msg = "Unable to open SDKSequences hashed file."
                  Msg<-1> = "Status code = " : Status()
                  Call DSLogFatal(Msg, RoutineName)
               End

            End
            Else

               Msg = "Unable to create SDKSequences hashed file."
               Msg<-1> = Output
               Call DSLogFatal(Msg, RoutineName)

            End

         End

      End


* File is now open; read and update sequence value.

      Attempts = 0

UpdateSequenceValue:

      Attempts += 1

      ReadVU SequenceValue From SDKSequences.fvar, SequenceName, 1
$IFDEF REPORT.LOCKED
      Locked
         * Sequence locked by other user.  Report the fact and re-try.
         Msg = "Sequence record " : Quote(SequenceName) : " locked by user number " : Status()
         Call DSLogWarn(Msg, RoutineName)
         If Attempts < MAXATTEMPTS
         Then
            Nap 500
            GoTo UpdateSequenceValue 
         End
         Else
            Msg = "Failed to update sequence after " : MAXATTEMPTS : " attempts."
            Call DSLogFatal(Msg, RoutineName)
         End
      End
$ENDIF
      Then
         * Sequence exists.  Return current value and increment.
         Ans = SequenceValue
         SequenceValue += 1
      End
      Else
         * Sequence did not exist.  Return 1 and store 2 as next value.
         Ans = 1
         SequenceValue = 2
      End

      WriteV SequenceValue To SDKSequences.fvar, SequenceName
      Else
         Msg = "Failed to update sequence " : Quote(SequenceName) : " in SDKSequences hashed file."
         Call DSLogFatal(Msg, RoutineName)
      End

      RETURN(Ans)

rachelsu · Post by **rachelsu** » Fri Mar 31, 2006 7:26 pm

Thanks very much for that! I'll check the routines we have here on Monday. Much appreciated!

rachelsu · Post by **rachelsu** » Sun Apr 02, 2006 6:10 pm

This is the code I see in our dev repository...

First sdk transform routine is called "KeyMgtGetNextValue", takes one argument..

Code: Select all

*************************************************************************
* Copyright (C) 2003, 1997-2002 Ascential Software Corporation. All Rights Reserved. *
* This code may be copied on condition that this copyright              *
* notice is included as is in any code derived from this source.        *
*************************************************************************
*
* Routine to generate a sequential number. The argument is a string used to 
* identify the sequence.
*
* NOTE: This routine will only work when the sequence is being accessed by 
* one process at a time.
*
* The routine uses a UniVerse file to store the next number to use. This 
* value is stored in a record named after the supplied argument. The 
* routine reads the number once, then increments and stores the value 
* in common storage, writing the next value back to file each time.
*

* Declare shared memory storage.
    Common /Sequences/ Initialized, NextVal, SeqFile

    EQUATE RoutineName TO 'KeyMgtGetNextValue'

    If NOT(Initialized) Then
	* Not initialised. Attempt to open the file.
        Initialized = 1
        Open "SDKSequences" TO SeqFile Else
            * Open failed. Create the sequence file.
            EXECUTE "CREATE.FILE SDKSequences 2 1 1"
            Open "SDKSequences" TO SeqFile Else Ans = -1
        End

   	* Attempt to read the named record from the file.
        Readu NextVal From SeqFile, Arg1 Else
	    * No record by that name.
 	    * Start new sequence at 1.
            NextVal = 1
        End
    End

    Ans = NextVal

    * Increment the sequence value, and write back to file.	
    NextVal = NextVal + 1
    Writeu NextVal On SeqFile, Arg1 Else Ans = -1

Second sdk transform routine is called "KeyMgtGetNextValueConcurrent", takes one argument..

Code: Select all

*************************************************************************
* Copyright (C) 2003, 1997-2002 Ascential Software Corporation. All Rights Reserved. *
* This code may be copied on condition that this copyright              *
* notice is included as is in any code derived from this source.        *
*************************************************************************
*
* Routine to generate a sequential number. The argument is a string used to 
* identify the sequence.
*
* NOTE: This routine uses locking to allow multiple processes to access the 
* same sequence.
*
* The routine uses a UniVerse file to store the next number to use. This 
* value is stored in a record named after the supplied argument. The 
* routine always attempts to read the number from the file, so that the 
* record for the sequence becomes locked. It increments and stores the 
* value in common storage, writing the next value back to file each 
* time. Writing back this value frees the lock.
*

* Declare shared memory storage.
    Common /Sequences/ Initialized, NextVal, SeqFile
    
    EQUATE RoutineName TO 'KeyMgtGetNextValueConcurrent'

    If NOT(Initialized) Then
	* Not initialised. Attempt to open the file.
       Initialized = 1
       Open "SDKSequences" TO SeqFile Else
            * Open failed. Create the sequence file.
            EXECUTE "CREATE.FILE SDKSequences 2 1 1"
            Open "SDKSequences" TO SeqFile Else Ans = -1
       End
    End

    * Read the named record from the file.
    * This obtains the lock (waiting if necessary).
    Readu NextVal From SeqFile, Arg1 Else
         NextVal = 1
    End

    Ans = NextVal
    NextVal = NextVal + 1

    * Increment the sequence value, and write back to file.	
    * This releases the lock.
    Write NextVal On SeqFile, Arg1 Else Ans = -1

Do they look ok to you? Please advise. Thanks a lot!

ray.wurlod · Post by **ray.wurlod** » Sun Apr 02, 2006 7:57 pm

They look like the genuine routines - the only difference is that one hangs on to the lock (via WriteU) while the other releases it (Write). No indication why this might create slowness - most users have found this to be an adequately fast mechanism.
If you can load the initial value of the sequence via a job parameter, then a very fast way to generate a sequence - assuming no other process is competing - is the derivation expression

Code: Select all

#InitialValue# + @OUTROWNUM

rachelsu · Post by **rachelsu** » Sun Apr 02, 2006 8:14 pm

ray.wurlod wrote:They look like the genuine routines - the only difference is that one hangs on to the lock (via WriteU) while the other releases it (Write). No indication why this might create slowness - most users have found this to be an adequately fast mechanism.

Thanks for reviewing this! Now I'm really curious to see what it looks like in other people's repositories...Or could it be the use of UniVerse or configuration of UniVerse causing the problem?

My testing on the @OUTROWNUM proves efficient. Thanks for your suggestion!

I'd still like to get to the bottom of the KeyMgt issue, if possible. It would be such a pain to work around this in a DW environment...I could use User-defined SQL queries and push that down to the DB...If I opt to using @OUTROWNUM, I then need to manage the #initialvalue#...It would be such a drawback on a DW specific ETL tool not being able to use its meant-to-be optimised routines while other ETL tool outperforms DBs on SK generation...If this indicates other problems in our DS environment, I'd be interested to learn what it was as it might lead to improvements on other fronts too...

rleishman · Post by **rleishman** » Sun Apr 02, 2006 9:24 pm

Question for the gurus: the code looks OK, so could something have happened to the underlying hashed file?

As a test (without destroying your keys file), you could create copies of the KeyMgt routines, edit them to change the underlying file name, and then try your test using the new routines.

ray.wurlod · Post by **ray.wurlod** » Sun Apr 02, 2006 11:40 pm

Interesting thought, Ross, but probably not the culprit. SDKSequences is created as a tiny static-hashed hashed file (fits on a single page), so there is an almost 100% certainty that it can't be corrupted.

A quick test would involve the following commands, issues from the Administrator client Command window, or from the dssh environment on the server.

Code: Select all

COUNT SDKSequences
SELECT @ID FMT '32L', F1 FROM SDKSequences USING DICT VOC;
FILE.STAT SDKSequences
UVFIXFILE SDKSequences

rachelsu · Post by **rachelsu** » Mon Apr 03, 2006 1:14 am

Here are the results...

ray.wurlod wrote:A quick test would involve the following commands, issues from the Administrator client Command window, or from the dssh environment on the server.

Code: Select all

COUNT SDKSequences

40310 records counted.

Code: Select all

FILE.STAT SDKSequences

File type = 2
Number of groups in file (modulo) = 1
Separation = 1
Number of records = 40310
Number of physical bytes = 1715712
Number of data bytes = 1608472
Average number of records per group = 40310.0000
Average number of bytes per group = 1608472.0000
Minimum number of records in a group = 40310
Maximum number of records in a group = 40310
Average number of bytes per record = 39.9026
Minimum number of bytes in a record = 32
Maximum number of bytes in a record = 56
Average number of fields per record = 1.0000
Minimum number of fields per record = 1
Maximum number of fields per record = 1

Groups 25% 50% 75% 100% 125% 150% 175% 200% full
0 0 0 0 0 0 0 1

Code: Select all

UVFIXFILE SDKSequences

uvfixfile does not support 64-bit files.

ArndW · Post by **ArndW** » Mon Apr 03, 2006 1:27 am

Rachel,

if you recall in the beginning of this thread, I asked if you could check the file sind I recalled that it was a type 2 and shouldn't contain many records. This would account for the slow performance you are seeing. You most likely called this up one time and passed it key numbers instead of file names. If you don't need the unique key values generated thus far (i.e. you are in DEV and not PROD) then you can CLEAR.FILE this file. If you really need 40,310 files to track surrogate keys then you should do a RESIZE to a DYNAMIC (type 30) file.

ArndW wrote:rachelsu,

it should not be that slow. Perhaps there is something wrong with the file itself... It should only contain one record per key that you are tracking and each record will have the value of the next key.

rachelsu · Post by **rachelsu** » Mon Apr 03, 2006 1:35 am

Thanks for the info! I only used DS for 1.5 weeks so far...How can I clear this file? I didn't create this file myself...suppose it was created by the transform procedures?

ArndW wrote:If you don't need the unique key values generated thus far (i.e. you are in DEV and not PROD) then you can CLEAR.FILE this file. If you really need 40,310 files to track surrogate keys then you should do a RESIZE to a DYNAMIC (type 30) file.

ArndW · Post by **ArndW** » Mon Apr 03, 2006 1:43 am

If you look at the code for the two routines you will see where they execute the command EXECUTE "CREATE.FILE SDKSequences 2 1 1" in one and Call DSExecute("UV", "CREATE.FILE SDKSequences 2 1 4", Output, Code) in the other. If you go to your ADMINistrator tool and execute the command line "CLEAR.FILE SKDSequences" you will remove all entries in this file; but that will force and keys that already exist to begin returning duplicates since all counters will start at 1 again. From the ADMINistrator you could also type in "LIST SDKSequences" to see what values you have - the key should be the surrogate key name you passed in as the parameter and the value should be the next sequence number. I am assuming you passed in many different values in some run of your program, i.e. you used a column as the parameter and you have many entries that have a value of "2". If you cannot just CLEAR.FILE then you might be able to select all entries with a value of 2 and remove those.

rachelsu · Post by **rachelsu** » Mon Apr 03, 2006 2:07 am

Magic!!! It worked! I'm now getting 13k rows/sec! That was indeed THE problem!

Thanks so much for your (ArndW, Ray, and Ross) wonderful help! Really appreciate it! If you happen to go pass Melb, let me know so I can buy you coffee or beer

ArndW wrote:If you go to your ADMINistrator tool and execute the command line "CLEAR.FILE SKDSequences" you will remove all entries in this file.

What is the significance of "CREATE.FILE SDKSequences 2 1 4"? Should we use this rather than "CREATE.FILE SDKSequences 2 1 1"? I now noticed the difference in command between Ray's and our Dev...

ArndW · Post by **ArndW** » Mon Apr 03, 2006 2:56 am

Rachel,

perhaps you could confirm that your Premium/Charter membership has paid off

The "2 1 1" and "2 1 4" are the command line options for File Type, Modulo, Group Size respectively. For this file the group size is not important - I think it is a matter of taste for some of the old timers , I always use "2 1 1" when I make a small file while others automatically use "2 1 4". I wonder if it is a UniVerse/UniData or PICK thing? But for your use I think that a group size of 1 is better.

rachelsu · Post by **rachelsu** » Mon Apr 03, 2006 5:45 am

It sure has

Thanks for your help!