UV table indices

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
spracht
Participant
Posts: 105
Joined: Tue Apr 15, 2003 11:30 pm
Location: Germany

UV table indices

Post by spracht »

Hi folks,

if I create a uv-table (using a uv-stage, not a hashfile-stage), I have to specify the columns forming the primary key. Will there be a (primary) index on that primary key columns (Querying this forum gives the impression that an index will not be created)?

Ray mentioned, that secondary indexes created on uv-tables are b-trees. In my understanding, a b-tree index can be used not only if all involved columns are specified in the filter condition, but also, if only a subset of the columns is used. For Example, an index created on col1, col2, col3 is also used by the RDBMS if you would only specify (col1 and col2) or only col1 in the filter criteria? Is this the case for universe as well?

Thank you very much in advance for responses!

Stephan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No index is created on the primary key of a UV table because, under the covers, it's a hashed file, so that the physical location of a the record having that primary key is calculated via the hashing algorithm - no need for a table scan of an index, or of the main file, which is where a hashed file gets its speed for lookups.

There is nothing to stop you creating an index on @ID (the primary key). There are some circumstances where it may help, such as extracting data from the table in key order. It won't help for reference lookups, as the cost-based query optimizer will (correctly) calculate that access via the hashing algorithm is cheaper.

The rules that govern whether a secondary index on N columns will be used to satisfy a constraint involving M of those columns are complex. The main criterion is the order in which the columns are named in the constraint clause and (versus?) the order in which the columns are named in the index definition. This is the case with all of the RDBMSs of which I'm aware, and is the case with UniVerse.

It's probably simpler to set up the situation in which you're interested and add one (or both) of the keywords EXPLAIN or REQUIRE.INDEX to your test query. EXPLAIN will inform you whether the execution plan includes resolution of the constaint via an index. REQUIRE.INDEX will not execute the query if at least one secondary index cannot be used.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Paul Preston
Participant
Posts: 24
Joined: Wed Apr 02, 2003 7:09 am
Location: United Kingdom

Post by Paul Preston »

Hi Stephan and all

I naively assumed that specifying the primary key columns for a UV table meant that there would be an index on these key columns. When data volumes increased I noticed the performance deteriorated.

Using the command line interface to examine the UV table I noticed no index so I created one on the primary key columns and the performance improved by orders of magnitude.

The thing to remember is that if the UV table is populated by a job that drops and recreates the table, then after refreshing the data the index will also need to be recreated. Very anoying to be caught out by this.

Paul.
Post Reply