Common Memory performance - read/write

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Your approach is similar to what happens with database sequences - by changing the sequence 'cache size' you pre-allocate blocks of sequence numbers so that you don't need to get back to the database as often, with the downside that you might lose much of a cach block of sequence numbers if the job aborts.

Using the concurrent key routine source to base my conclusion on, there is a readu to get the record and write almost immediately thereafter (the write will release the exclusive lock). Running this with many concurrent processes should not have any processes showing 'sleep' or 'wait' states for an appreciable amount of time in normal processing.

On the other hand, if some process does hold a lock then the READU statements will wait indefinately until that lock is released.

I would recommend you try putting a LOCKED clause in your job and writing something out to a logfile; you might be surprised to see your processes locking for some reason you hadn't discovered yet; or that they are not locking at all. You will have to change the single READU statement to be inside of a loop if you do put in a LOCKED clause, so that you retry reading the record until the lock is release (or until you give up and abort the routine)
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Your approach is similar to what happens with database sequences - by changing the sequence 'cache size' you pre-allocate blocks of sequence numbers so that you don't need to get back to the database as often, with the downside that you might lose much of a cach block of sequence numbers if the job aborts.

Using the concurrent key routine source to base my conclusion on, there is a readu to get the record and write almost immediately thereafter (the write will release the exclusive lock). Running this with many concurrent processes should not have any processes showing 'sleep' or 'wait' states for an appreciable amount of time in normal processing.

On the other hand, if some process does hold a lock then the READU statements will wait indefinately until that lock is released.

I would recommend you try putting a LOCKED clause in your job and writing something out to a logfile; you might be surprised to see your processes locking for some reason you hadn't discovered yet; or that they are not locking at all. You will have to change the single READU statement to be inside of a loop if you do put in a LOCKED clause, so that you retry reading the record until the lock is release (or until you give up and abort the routine)
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Your function is valid, in fact if you searched the forum you would have stumbled across the copy I posted years ago that operates the same way. You need to make sure you assign by large blocks - you don't want the function colliding with other simultaneous calls. If your job runs at 10K rows/second, assigning in 10K ranges, then every second you're hitting the lock table. If you have 10 simultaneous instances hitting every 1/10 of a second, there's a lot of collisions.

The LOCKED clause on the READU adds significant wait time on retries. Consider setting the blocking value as an argument to the function, this way you can right-size the blocking to the expected rows/second. Try to use a number large enough to prevent collisions, maybe size to get a block every 20 seconds and reverse calculate the math. Just don't oversize as you're chewing thru ranges of numbers and leaving gaps behind.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

Thanks for the help - it turns out there was nothing wrong with my design or routine, it was simply that I had left a statement in there which was getting the process ID through DSGetJobInfo. This is why the performance was so bad and nothing to do with reading from shared memory. Once I had removed this it's fine.

Code: Select all

$INCLUDE DSINCLUDE JOBCONTROL.H

      vRoutineName = "GetNextSequenceCache"
      vSequenceFile = "hETLSequencesCache"
      vBlockSize = 10000

* Declare shared memory storage.
      COMMON /vMemmoryName/ vNextValMem, vMaxVal, InitializedMem

      If NOT(InitializedMem) OR vNextValMem > vMaxVal Then GoTo GetNextSeqRange
      Else
         
         Ans = vNextValMem
         vNextValMem = vNextValMem + 1
         GoTo Exit

      End


GetNextSeqRange: 
**** Open the sequence file and obtain the next range based on vBlockSize
**** 
         Open vSequenceFile TO vOpenFile Else
            * Open failed. Create the sequence file.
            Call DSLogInfo("Creating file" : vSequenceFile , vRoutineName)
            EXECUTE "CREATE.FILE " : vSequenceFile : " 2 1 1"
            Open vSequenceFile TO vOpenFile Else Ans = -1
         End
      
      * Read the named record from the file.
      * This obtains the lock (waiting if necessary).
      Readu vNextVal From vOpenFile, aSequenceName Else
         vNextVal = 1
      End
      
      vMaxVal = vNextVal + vBlockSize
      InitializedMem = 1
      Ans = vNextVal
      vNextValMem = vNextVal + 1
      vNextVal = vMaxVal + 1

      * Increment the sequence value, and write back to file.	
      * This releases the lock.
      Write vNextVal On vOpenFile, aSequenceName Else Ans = -1

      GoTo Exit

Exit:
Regards,

Nick.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You re-open the file each time you need a new range. If you put the file variable into COMMON as well, the file can be held open for the duration of the job.

Code: Select all

      $INCLUDE UNIVERSE.INCLUDE FILEINFO.H


* Variable vOpenFile is in COMMON
      If FileInfo(vOpenFile, FINFO$IS.FILEVAR)
      Else
         Open vSequenceFile TO vOpenFile Else
            * Open failed. Create the sequence file.
            Call DSLogInfo("Creating file" : vSequenceFile , vRoutineName)
            EXECUTE "CREATE.FILE " : vSequenceFile : " 2 1 1"
            Open vSequenceFile TO vOpenFile Else Ans = -1
         End 
      End
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

I did remove that on purpose because I thought the file was being opened every time a new range was required anyway.

Was I incorrect in this? I had added some debug statements to show what was happening and they reported that the file was being opened each time anyway even with the memory statements. Maybe I need to check whether they were in the correct place.

Another issue that I am looking at now is that if more than one sequence is used within the same process this doesn't work as the routine does not manage the two sequences separately in memory. I had thought I could change the name of the common memory dynamically but that doesn't seem to work. If I use a variable vMemoryName which is passed into the routine it treats vMemoryName as a literal. Is this correct?

I am currently modifying the routine so it holds and array in common memory with a field for each sequence containing the same values as the routine above. I'm not finished with it yet but will add it to this tread when I am in case anyone can see an issue with it.

Thanks, Nick.
Regards,

Nick.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If you go to the link to my function, you'll see that I handled only opening the file once by putting the file handle into the COMMON. Take a peek...hint hint
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

Thanks Ken.

I have now added the file handle into COMMON as you had in your routine.

The one limitation of your routine is that it doesn't handle more than one sequence being used in the same transformer/process.

I have tried to handle this by using an array that holds all the sequences being accessed within the curret process. It seems to work but I can see there would be an issue if someone called a sequence a number. i.e. Sequence name = 123.

The FIND statement might find 123 as a value in the array and return the incorrect field.

What I would like is that FIND only returns the field (FM) when the sequence name is the first value. I am thinking of using a loop and checking the first value when FIND returns a field, but was hoping there would be a neater way of doing this.

If there are any other suggestions for this routine they would be appreciated.

Regards, Nick.

Code: Select all

$INCLUDE DSINCLUDE JOBCONTROL.H

      vRoutineName = "GetNextSequenceCache"
      vSequenceFile = "hETLSequences"
      vBlockSize = aBlockSize
      vSequenceName = aSequenceName

**** Declare shared memory storage.
**** vSeqArray holds an array of sequences in memory so the same routine
**** can be used for different sequences in the same transformer/process

      COMMON /mETLSequence/ vSeqArray, Initialized, vOpenFile

      If NOT(Initialized) Then
         vSeqArray<-1> = vSequenceName : @VM : '0' : @VM : '0'
         FIND vSequenceName IN vSeqArray SETTING FM Then GoTo OpenSequenceFile
         Else
            Call DSLogWarn('Error with array', vRoutineName)
            GoTo Exit
         End
      End
      Else
         FIND vSequenceName IN vSeqArray SETTING FM Then
            vNextValMem = EXTRACT(vSeqArray, FM,2)
            vMaxVal = EXTRACT(vSeqArray, FM,3)
            If vNextValMem > vMaxVal Then GoTo GetNextSeqRange Else GoTo GetNextSeqNumber
         End
         Else
            vSeqArray<-1> = vSequenceName : @VM : '0' : @VM : '0'
            FIND vSequenceName IN vSeqArray SETTING FM Then GoTo GetNextSeqRange
            Else
               Call DSLogWarn('Error with array', vRoutineName)
               GoTo Exit
            End
         End
      End

OpenSequenceFile:
**** Open the sequence file and obtain the next range based on vBlockSize
****
      Call DSLogInfo('Opening Sequence File', vRoutineName)
      Open vSequenceFile TO vOpenFile Else
         * Open failed. Create the sequence file.
         Call DSLogInfo("Creating file" : vSequenceFile , vRoutineName)
         EXECUTE "CREATE.FILE " : vSequenceFile : " 2 1 1"
         Open vSequenceFile TO vOpenFile Else Ans = -1
      End

GetNextSeqRange:
      * Read the named record from the file.
      * This obtains the lock (waiting if necessary).
      Call DSLogInfo('Getting next sequence range', vRoutineName)
   *   Call DSLogInfo('vOpenFile=' : vOpenFile , vRoutineName)
      Call DSLogInfo('Getting next sequence range', vRoutineName)
      Readu vNextVal From vOpenFile, vSequenceName Else
         vNextVal = 1
      End

      * Increment the sequence value, and write back to file.	
      * This releases the lock.

      vNextValMem = vNextVal
      vMaxVal = vNextVal + vBlockSize
      vSeqArray<FM,3> = vMaxVal
      vNextVal = vNextVal + vBlockSize + 1
      Write vNextVal On vOpenFile, aSequenceName Else Ans = -1

      Initialized = 1

GetNextSeqNumber:

      Ans = vNextValMem
      vNextValMem = vNextValMem + 1
      vSeqArray<FM,2> = vNextValMem
      GoTo Exit

Exit:
Regards,

Nick.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You need to change your array. Think of it as a pivot.

Put your table names for assignment into attribute 1.
Put your last used surrogate key into attribute 2.
Put the number of values left to go until we lock and fetch another into attribute 3.

The LOCATE (or FIND) statement can read across what's in <1> and give you back the position of the value where it matches. So if your tablenames are in <1>, the corresponding last used surrogate key value will be in <2>. So you find the position in <1>, then look down at <2> and that's your value to use next.

The code would be something like:

Code: Select all

LOCATE MyTableName in vSeqArray<1> SETTING PTR Then
   NextNumber = vSeqArray<2,PTR> + 1
   vSeqArray<2,PTR> = vSeqArray<2,PTR> + 1
   vSeqArray<3,PTR> = vSeqArray<3,PTR> - 1
End Else
*
* MyTableName is not in vSeqArray<1>, so add it.  Lock it so that no other
* job is doing this as well.  You probably want to re-check vSeqArray after
* securing the lock to make sure that you got the lock before another job 
* snuck in and did the following logic.
*
* Maybe you've built a sequential file of tablename and maximum values
* so fetch that maximum value now to put into the array
* I'm assuming you can do that code.
*
   INS MyTableName BEFORE vSeqArray<1>
   INS MyTableNameMaximumValue + vBlockSize + 1 BEFORE vSeqArray<2>
   INS vBlockSize BEFORE vSeqArray<3>
   NextNumber = MyTableNameMaximumValue + 1
End
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

Excellent thanks, I will try it with the array as you specified.

I don't think there is any need to lock the array though because it is only available to the current process. If multiple jobs are using the routine, each process will have it's own version of the array with a different range of cached numbers in it.

At least that is my assumption.....correct me if I am wrong.....

Thanks, Nick
Regards,

Nick.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The jobs won't share the same COMMON, just the same hashed file. But, if the same function is running in different Transformers (did I say job earlier? I meant Transformer) then you have to deal with the array changing. Maybe rather than INSert at the beginning we should add to the end of the array instead.

The idea is that separate, simultaneous calls to this Function within the same job would manipulate the array causing a temporary misalignment between attributes <1> and <2>. Since I took the easy route and put the new data in the first column, the array temporarily shifts until the second INS on attribute <2>. You could just TableNameCount=DCOUNT(vSeqArray<1>,@VM) and then rather than INS just assign vSeqArray<1,TableNameCount+1> = MyTableName. As long as another function call in another Transformer stage doesn't care about this table you won't need a LOCK to guarantee that this table is properly entered into the array.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

I had adapted the code you provided to add the new sequences onto the end of the array anyway,

Code: Select all

 
INS vSequenceName BEFORE vSeqArray<1,-1>
INS 0 BEFORE vSeqArray<2,-1>
INS 0 BEFORE vSeqArray<3,-1>
LOCATE vSequenceName IN vSeqArray<1> SETTING vPos Then GoTo GetNextSeqRange

but I am still not sure that there can be an contention between different calls to the same routine.

If you have the following:

SeqFile ----> Tfm1 -----> Tfm2 ------> SeqFile

And inter-process is not turned on, there will be one process for the two Tfms. If each of them have a call to this routine, they will both share the same COMMON and therefore the same array, but can they simultaneously call the routine? I thought because they were in the same process, even with in process buffering there could only be one call to the routine at a time. Can anyone confirm deny this? Ray?

If inter-process is turned on then each transformer will be a separate process and therefore a separate COMMON area and array so they will not be in contention with each other.

I would like someone to confirm/deny these assumptions though.

Thanks, Nick.
Regards,

Nick.
Post Reply