Using universe for hashfile

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Bala R
Participant
Posts: 66
Joined: Mon May 30, 2005 9:52 pm

Using universe for hashfile

Post by Bala R »

Hi,
Does using Universe stage to query records in a hashfile reduce the processing speed?.
I used the universe stage to write a user defined query. The hashfile has two keys empId and Date and the query would return if passed-in date is greater than the hashfile date for empID.
Now the rows per second is so low (10+ rows/sec). Is this expected behaviour?
Also the metadata for Date in hashfile is defined as timestamp whereas in Universe is shows up as Varchar(254) :?: .Why that is so?

Thanks for your time.
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

I've never used the UV stage, but since no one else has responded, I'll have a go...

You are using a "range" comparison (eg. DATE < val) in a UV stage because a hashed file stage can't do range comparisons.
With an equals clause (DATE = val), the UV stage would be able to use exactly the same lookup method as the hashed file stage, but not with a range predicate. Rather than going straight to the required row, UV has to search every row, and it's repeating this full scan for every lookup.

You need to index the UV table, probably on the ID and DATE in that order. That's where I step off....
Ross Leishman
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

UniVerse does not have a TimeStamp data type. Hashed files store all data as string, but DataStage allows the Hashed File stage to pretend that there really are data types out there.

Ross is pretty close, but you won't need an index on the primary key, since the hashing algorithm will find keys faster than an index could. An index on the DATE field will return a "Select List" of primary key values that will be processed via the hashing algorithm to find the records corresponding to those keys.

Beware, however, that UniVerse has a cost-based query optimizer; if your query will return more than 30% of the rows in the UniVerse table (for example off a large date range), then the index will not be used (since the majority of pages in the main table structure will have to be touched; a table scan will be less I/O than accessing the index then doing most of a table scan). You can use keywords like NO.OPTIMIZE and REQUIRE.INDEX to try to defeat this default, but there are no guarantees.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Bala R
Participant
Posts: 66
Joined: Mon May 30, 2005 9:52 pm

Post by Bala R »

Thanks for the replies.
ray.wurlod wrote:UniVerse does not have a TimeStamp data type. Hashed files store all data as string

Thought so.. well, was just a thought :roll: ...
Decided to change the job design, as my UV knowledge is limited!.
Changed the Universe stage as the driver and OCI as reference and now I have a lot better performance.

Thanks again.
Post Reply