keymgt slow performance?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

rachelsu
Charter Member
Charter Member
Posts: 17
Joined: Mon Mar 20, 2006 6:02 pm
Location: Melbourne

keymgt slow performance?

Post by rachelsu »

In developing an upsert job, I tried KeyMgt functions to assign surrogate keys to a sequence file and output SKs and rest of the data to another sequence file...then load that into Oracle DB. Surprisingly, I found that KeyMgtGetNextValue gets 24 row/sec and KeyMgtGetNextValueConcurrent gets 12 row/sec throughput :evil: . Is this normal in DS Server 7? I'm new to DS. Any advice on how to get the best performance on SK generation is much appreciated! Should we leave this to the DB? Other ETL tools I've used generate SKs much much faster...Am I using this incorrectly? Or shall I use other functions in DS? TIA..
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Rachel

Welcome to DSXchange

KeyMgtGetNextValue and KeyMgtGetNextValueConcurrent is same except the latter is used when you have multiple jobs running at the same time to get the keys. First if you don't think that is never going to happen then you will be better of using KeyMgtGetNextValue. Second you cannot measure the performance like this. Performance depends on many factors your job design, number of jobs running at the same time, network, cpu hard disk etc.
The efficient way to implement this is use the database functionality to create surrogate keys or create a table which will hold the surrogate key and in your job read that table and use the stage variable @OUTROWNUM function to add increment number to the surrogate key column. After job finishes update the Table with the new surrogate key.
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
rachelsu
Charter Member
Charter Member
Posts: 17
Joined: Mon Mar 20, 2006 6:02 pm
Location: Melbourne

Thank you!

Post by rachelsu »

Hi Siva, thanks very much for this ! I had similar findings after a few more tries :mrgreen:

KeyMgt functions was identified to be the blocker after adding debug params and disabling other transforms...I was pretty shocked to find that - initially I was suspecting CRC32 or stage variable calculations.

[quote="rasi"]Second you cannot measure the performance like this. Performance depends on many factors your job design, number of jobs running at the same time, network, cpu hard disk etc.
[/quote]
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

Keymgt functions uses common blocks. I read that it is better to avoid common blocks. Does this also imply that it is better to avoid keymgt functions and rather user @OUTROWNUM or any other alternative?
rachelsu
Charter Member
Charter Member
Posts: 17
Joined: Mon Mar 20, 2006 6:02 pm
Location: Melbourne

Post by rachelsu »

Where did you read that? Can you provide the link here? :?:

From what I've tested, I would say KeyMgt is much slower than @OUTROWNUM or DB key generation functions...

[quote="balajisr"]Keymgt functions uses common blocks. I read that it is better to avoid common blocks. Does this also imply that it is better to avoid keymgt functions and rather user @OUTROWNUM or any other alternative?[/quote]
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

rachelsu wrote:Where did you read that? Can you provide the link here? :?:

From what I've tested, I would say KeyMgt is much slower than @OUTROWNUM or DB key generation functions...
balajisr wrote:Keymgt functions uses common blocks. I read that it is better to avoid common blocks. Does this also imply that it is better to avoid keymgt functions and rather user @OUTROWNUM or any other alternative?
page 2-12 of server job developer guide which talks about improving performance using in process and inter process.

I agree with you that KeyMgt as far as I know is slower than other alternatives.
rachelsu
Charter Member
Charter Member
Posts: 17
Joined: Mon Mar 20, 2006 6:02 pm
Location: Melbourne

Post by rachelsu »

:oops: I didn't go through it in detail. Thanks for the reminder!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Rachel,

try writing a job that reads a sequential file, uses the key management routines and wrietes to a sequential file (or to a sequential file in /dev/null) to see the actual speed on your system. The speeds should not be 24 rows per second, even on an overloaded system. Most system should give results over 10K rows/sec at a minimum.

COMMON blocks are very efficient, and I am certain that balajisr's comment about avoiding them is mistaken. The COMMON in use in these routines is to avoid having to re-open the hashed file each call - which would make the routine perform very slowly indeed.

As rasi has already mentioned, using one call to get a seed value and then using the counters @INROWNUM or @OUTROWNUM to provide the increments is a very fast way of doing doing - but since the hashed file max value is not being updated it needs to be corrected after a complete run and also handled in case of an abort. If you are writing to a database, you can always do your SQL's equivalent of get max value on the key field prior to the run and use that as the seed.
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

ArndW wrote:Rachel,

try writing a job that reads a sequential file, uses the key management routines and wrietes to a sequential file (or to a sequential file in /dev/null) to see the actual speed on your system. The speeds should not be 24 rows per second, even on an overloaded system. Most system should give results over 10K rows/sec at a minimum.

COMMON blocks are very efficient, and I am certain that balajisr's comment about avoiding them is mistaken. The COMMON in use in these routines is to avoid having to re-open the hashed file each call - which would make the routine perform very slowly indeed.

As rasi has already mentioned, using one call to get a seed value and then using the counters @INROWNUM or @OUTROWNUM to provide the increments is a very fast way of doing doing - but since the hashed file max value is not being updated it needs to be corrected after a complete run and also handled in case of an abort. If you are writing to a database, you can always do your SQL's equivalent of get max value on the key field prior to the run and use that as the seed.
thanks for the clarification arnd.

I arrived at the above conclusion only on the basis of the below statements in Page 2-12 of server job developer guide
You cannot use inter-process row buffering if your job uses common blocks in transform functions to pass data between stages. This is not a recommended practice and it is advisable to redesign the job to use row buffering rather than common blocks.
what does the above statements mean exactly. :? Why the manual says that using common block is not recommended? Is it only when we use interprocess we should avoid common blocks? Is this not a major restriction in using common block?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

balajisr,

when you have interprocess turned on the transform stages will be executing in different processes and any COMMON block information will not be shared between them; this is why that page recommends not using COMMON blocks to share data between stages. If you use a COMMON block to store persistent information between calls on the same process (i.e. within a single transform stage) then you will have no problems at all. It is only when a call in Transform2 tries to read a value set by Transform1 that it will not work with interprocess turned on.
rachelsu
Charter Member
Charter Member
Posts: 17
Joined: Mon Mar 20, 2006 6:02 pm
Location: Melbourne

Post by rachelsu »

I tried a simple server job

SEQ FILE A --> TRANSFORM --> SEQ FILE B

The TRANSFORM contains a direct map between A and B columns except the last numeric column is set to use KeyMgtGetNextValue('atest') instead of the input.

The speed I'm getting is 3~4 rows/sec with a stop limit at 5k rows...

Am I doing something wrong? Or is our server not working properly? But for other simple transforms on the same box I'm getting a few k rows/sec throughput...???
try writing a job that reads a sequential file, uses the key management routines and wrietes to a sequential file (or to a sequential file in /dev/null) to see the actual speed on your system. The speeds should not be 24 rows per second, even on an overloaded system. Most system should give results over 10K rows/sec at a minimum.
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Rachel

3 to 4 rows per second is not normal. I suspect something is wrong. From Sequential to Sequential without having any complexity inside your transform it should come up with few thousand records per second atleast. By the way how many columns are there in your sequential file.

Try to run this test for atleast 100,000 records to get average rows/per second.
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
rachelsu
Charter Member
Charter Member
Posts: 17
Joined: Mon Mar 20, 2006 6:02 pm
Location: Melbourne

Post by rachelsu »

18 columns in total (approx 4-20 varchar for every column). Once I remove the KeyMgt transform, it runs at a few k rows/sec....hence my initial question about KeyMgt performance...
rasi wrote:3 to 4 rows per second is not normal. I suspect something is wrong. From Sequential to Sequential without having any complexity inside your transform it should come up with few thousand records per second atleast. By the way how many columns are there in your sequential file.

Try to run this test for atleast 100,000 records to get average rows/per second.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

rachelsu,

it should not be that slow. Perhaps there is something wrong with the file itself. I'm not at a DataStage machine right now, but I seem to recall that this is not a dynamic file (it doesn't need to be, as it holds only a couple of records). I'll post the file name once I get to a machine (you can find that name if you look at the source program) and then it would be good for you to declare that file on the designer canvas with one column and do a view data on it. It should only contain one record per key that you are tracking and each record will have the value of the next key.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Has some kind soul modified your KeyMgt routine? (There was one group of consultants running around Australia and New Zealand doing that for some years - rebadged the SDK routines as their own, and managed to ruin more than a few of them in the process.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply