UV stage slow

Sreenivasulu · Post by **Sreenivasulu** » Tue Jun 08, 2004 12:38 am

I am using an UV stage for inserting multiple records for a single record in the source . This turns out to be quite slow hence i have got the
multiple records in the source by doing a join (using sql). Is there anyway we can optimize the performance of a UV stage

Regards

vmcburney · Post by **vmcburney** » Tue Jun 08, 2004 12:55 am

Which part of the process is slow? The population of the UV table, the join statement or the processing of the multiplied rows?

What are your requirements for turning a single row into multiple rows? Perhaps there is a more efficient way to do it without touching data down to a UV table.

ray.wurlod · Post by **ray.wurlod** » Tue Jun 08, 2004 5:51 pm

Do a run of your job from Director. In the Job Run Options dialog (where you enter parameters, etc.) choose the Tracing tab, select the active stage (Transformer stage) and enable collection of statistics.

One extra event will be logged; it will tell you what proportion of the time was spend doing what, so you can identify the "hot spots".

Why did you choose a UV stage? Surely a hashed file stage would be faster - you can cache it in memory for writing, for one thing. (It's not clear from your post what you're trying to accomplish.)

You can optimize selection using a join in a UV stage in the same way you would in any other database, by indexing the columns participating in the join. But I'm not sure this is what you need.

Sreenivasulu · Post by **Sreenivasulu** » Tue Jun 08, 2004 11:29 pm

Ray,VMC:

I am not using a hash file since it would only take a first matching record whereas a UV stage can take multiple records if multiple records match.

Hope you are clear about my problem

Regards

ray.wurlod · Post by **ray.wurlod** » Wed Jun 09, 2004 12:32 am

No, not clear at all. A hashed file will take all records, but only the last will remain, since every update to a hashed file is a destructive overwrite.

A UV stage will only take matching records if you use one of the slow, double-operation, rules (insert or update, or update or insert, or replace (delete then insert)). After all, the UV stage uses SQL, which you can view.

What - exactly - are you trying to accomplish?

Sreenivasulu · Post by **Sreenivasulu** » Wed Jun 09, 2004 1:14 am

Ray,

I am using the Hash file for lookup (not for writing into it). Hence if there
are multiple matching records it would take the first matching record.
If i use a UV stage for lookup then i am able to transfer more than
one matching record to the target. But the issue is that the UV stage
is slow.

Regards

ray.wurlod · Post by **ray.wurlod** » Wed Jun 09, 2004 2:06 am

If you're performing a lookup against a hashed file, you're getting the only match. Not the first, not the last, the only match.

Create an index on the UV table underlying the UV stage, on the key column(s). If there are any other constrained columns, index these too.

Get to a TCL prompt (or use the Adminstrator client's Command window) and execute the generated SQL with the EXPLAIN keyword appended (ahead of the semi-colon). This will show you that the default SELECT uses a table scan. Of course it's slow. You haven't done anything to assist the query. UV tables are not automatically indexed on any column; if you're selecting on the entire primary key it can use a hashing algorithm; your requirement suggests that you are selecting on a secondary key. Index it.

If you really want slow, substitute an ODBC stage!

You will never, ever, get the performance out of a UV stage (which is using SQL to query a disk-based table) that you will get out of a Hashed File stage (which uses hashing algorithm to directly access the required record in memory). Memory access speed is at least 1000 times faster than disk access speed.