UV stage slow

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

UV stage slow

Post by Sreenivasulu »

I am using an UV stage for inserting multiple records for a single record in the source . This turns out to be quite slow hence i have got the
multiple records in the source by doing a join (using sql). Is there anyway we can optimize the performance of a UV stage

Regards
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Which part of the process is slow? The population of the UV table, the join statement or the processing of the multiplied rows?

What are your requirements for turning a single row into multiple rows? Perhaps there is a more efficient way to do it without touching data down to a UV table.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do a run of your job from Director. In the Job Run Options dialog (where you enter parameters, etc.) choose the Tracing tab, select the active stage (Transformer stage) and enable collection of statistics.

One extra event will be logged; it will tell you what proportion of the time was spend doing what, so you can identify the "hot spots".

Why did you choose a UV stage? Surely a hashed file stage would be faster - you can cache it in memory for writing, for one thing. (It's not clear from your post what you're trying to accomplish.)

You can optimize selection using a join in a UV stage in the same way you would in any other database, by indexing the columns participating in the join. But I'm not sure this is what you need.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Ray,VMC:

I am not using a hash file since it would only take a first matching record whereas a UV stage can take multiple records if multiple records match.

Hope you are clear about my problem

Regards
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No, not clear at all. A hashed file will take all records, but only the last will remain, since every update to a hashed file is a destructive overwrite.

A UV stage will only take matching records if you use one of the slow, double-operation, rules (insert or update, or update or insert, or replace (delete then insert)). After all, the UV stage uses SQL, which you can view.

What - exactly - are you trying to accomplish?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Ray,

I am using the Hash file for lookup (not for writing into it). Hence if there
are multiple matching records it would take the first matching record.
If i use a UV stage for lookup then i am able to transfer more than
one matching record to the target. But the issue is that the UV stage
is slow.

Regards
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you're performing a lookup against a hashed file, you're getting the only match. Not the first, not the last, the only match.

Create an index on the UV table underlying the UV stage, on the key column(s). If there are any other constrained columns, index these too.

Get to a TCL prompt (or use the Adminstrator client's Command window) and execute the generated SQL with the EXPLAIN keyword appended (ahead of the semi-colon). This will show you that the default SELECT uses a table scan. Of course it's slow. You haven't done anything to assist the query. UV tables are not automatically indexed on any column; if you're selecting on the entire primary key it can use a hashing algorithm; your requirement suggests that you are selecting on a secondary key. Index it.

If you really want slow, substitute an ODBC stage!

You will never, ever, get the performance out of a UV stage (which is using SQL to query a disk-based table) that you will get out of a Hashed File stage (which uses hashing algorithm to directly access the required record in memory). Memory access speed is at least 1000 times faster than disk access speed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply