Hi,
Does using Universe stage to query records in a hashfile reduce the processing speed?.
I used the universe stage to write a user defined query. The hashfile has two keys empId and Date and the query would return if passed-in date is greater than the hashfile date for empID.
Now the rows per second is so low (10+ rows/sec). Is this expected behaviour?
Also the metadata for Date in hashfile is defined as timestamp whereas in Universe is shows up as Varchar(254) .Why that is so?
Thanks for your time.
Using universe for hashfile
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 252
- Joined: Mon Sep 19, 2005 10:28 pm
- Location: Melbourne, Australia
- Contact:
I've never used the UV stage, but since no one else has responded, I'll have a go...
You are using a "range" comparison (eg. DATE < val) in a UV stage because a hashed file stage can't do range comparisons.
With an equals clause (DATE = val), the UV stage would be able to use exactly the same lookup method as the hashed file stage, but not with a range predicate. Rather than going straight to the required row, UV has to search every row, and it's repeating this full scan for every lookup.
You need to index the UV table, probably on the ID and DATE in that order. That's where I step off....
You are using a "range" comparison (eg. DATE < val) in a UV stage because a hashed file stage can't do range comparisons.
With an equals clause (DATE = val), the UV stage would be able to use exactly the same lookup method as the hashed file stage, but not with a range predicate. Rather than going straight to the required row, UV has to search every row, and it's repeating this full scan for every lookup.
You need to index the UV table, probably on the ID and DATE in that order. That's where I step off....
Ross Leishman
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
UniVerse does not have a TimeStamp data type. Hashed files store all data as string, but DataStage allows the Hashed File stage to pretend that there really are data types out there.
Ross is pretty close, but you won't need an index on the primary key, since the hashing algorithm will find keys faster than an index could. An index on the DATE field will return a "Select List" of primary key values that will be processed via the hashing algorithm to find the records corresponding to those keys.
Beware, however, that UniVerse has a cost-based query optimizer; if your query will return more than 30% of the rows in the UniVerse table (for example off a large date range), then the index will not be used (since the majority of pages in the main table structure will have to be touched; a table scan will be less I/O than accessing the index then doing most of a table scan). You can use keywords like NO.OPTIMIZE and REQUIRE.INDEX to try to defeat this default, but there are no guarantees.
Ross is pretty close, but you won't need an index on the primary key, since the hashing algorithm will find keys faster than an index could. An index on the DATE field will return a "Select List" of primary key values that will be processed via the hashing algorithm to find the records corresponding to those keys.
Beware, however, that UniVerse has a cost-based query optimizer; if your query will return more than 30% of the rows in the UniVerse table (for example off a large date range), then the index will not be used (since the majority of pages in the main table structure will have to be touched; a table scan will be less I/O than accessing the index then doing most of a table scan). You can use keywords like NO.OPTIMIZE and REQUIRE.INDEX to try to defeat this default, but there are no guarantees.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks for the replies.
Thought so.. well, was just a thought ...
Decided to change the job design, as my UV knowledge is limited!.
Changed the Universe stage as the driver and OCI as reference and now I have a lot better performance.
Thanks again.
ray.wurlod wrote:UniVerse does not have a TimeStamp data type. Hashed files store all data as string
Thought so.. well, was just a thought ...
Decided to change the job design, as my UV knowledge is limited!.
Changed the Universe stage as the driver and OCI as reference and now I have a lot better performance.
Thanks again.