Using universe for hashfile

Bala R · Post by **Bala R** » Tue Nov 15, 2005 6:43 pm

Hi,
Does using Universe stage to query records in a hashfile reduce the processing speed?.
I used the universe stage to write a user defined query. The hashfile has two keys empId and Date and the query would return if passed-in date is greater than the hashfile date for empID.
Now the rows per second is so low (10+ rows/sec). Is this expected behaviour?
Also the metadata for Date in hashfile is defined as timestamp whereas in Universe is shows up as Varchar(254)

.Why that is so?

Thanks for your time.

rleishman · Post by **rleishman** » Tue Nov 15, 2005 7:59 pm

I've never used the UV stage, but since no one else has responded, I'll have a go...

You are using a "range" comparison (eg. DATE < val) in a UV stage because a hashed file stage can't do range comparisons.
With an equals clause (DATE = val), the UV stage would be able to use exactly the same lookup method as the hashed file stage, but not with a range predicate. Rather than going straight to the required row, UV has to search every row, and it's repeating this full scan for every lookup.

You need to index the UV table, probably on the ID and DATE in that order. That's where I step off....

ray.wurlod · Post by **ray.wurlod** » Tue Nov 15, 2005 10:48 pm

UniVerse does not have a TimeStamp data type. Hashed files store all data as string, but DataStage allows the Hashed File stage to pretend that there really are data types out there.

Ross is pretty close, but you won't need an index on the primary key, since the hashing algorithm will find keys faster than an index could. An index on the DATE field will return a "Select List" of primary key values that will be processed via the hashing algorithm to find the records corresponding to those keys.

Beware, however, that UniVerse has a cost-based query optimizer; if your query will return more than 30% of the rows in the UniVerse table (for example off a large date range), then the index will not be used (since the majority of pages in the main table structure will have to be touched; a table scan will be less I/O than accessing the index then doing most of a table scan). You can use keywords like NO.OPTIMIZE and REQUIRE.INDEX to try to defeat this default, but there are no guarantees.

Bala R · Post by **Bala R** » Wed Nov 16, 2005 1:31 am

Thanks for the replies.

ray.wurlod wrote:UniVerse does not have a TimeStamp data type. Hashed files store all data as string

Thought so.. well, was just a thought

...
Decided to change the job design, as my UV knowledge is limited!.
Changed the Universe stage as the driver and OCI as reference and now I have a lot better performance.

Thanks again.