where conditions in Hash

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
nageshn
Participant
Posts: 6
Joined: Fri May 09, 2003 12:39 am

where conditions in Hash

Post by nageshn »

Can anyone explain how to use different conditions for a where clause in a HASH lookup. How to use > or < or between etc.,

I can't enable the 'selection' tab. I think it must be hiding in somewhere there.

ANy help. I am in dire straits.


eg.,

I want to choose something like this
-------
where process_date >= start_date and process_date
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

Hi Nagesh,

the Stage for to connect in a DB-way to Universe-files (=Hash-files) is the UV-Stage, there you can do what you want to do.

Wolfgang
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A hashed file is intended for a lookup based on an exact primary key value. The hashing algorithm is applied to the value supplied (which is assumed to be a primary key value) and the corresponding row is returned extremely quickly, because the hashing algorithm generates the address of the record and the system can go straight there (whether "there" is a disk address or a memory address).

Hashed files are not intended to perform any kind of comparison other than lookup based on an exact primary key. That is why the Selection tab is not available.

Fortunately for you, as Wolfgang has pointed out, hashed files are also the means by which UV tables are implemented, so you can use a UV stage. If this is something you want to do often, you may want to consider creating the hashed file in the UV stage, and subsequently creating an index on your search column. This is likely to speed performance, as the UV stage does not (cannot) make use of the caching capability that the Hashed File stage offers.

Are you performing this lookup based on a primary key AND a date restriction, or solely on the date restriction?
If the former, consider only loading the hashed file with rows that meet those criteria, then you won't have an issue, and will be able to use the speed of a Hashed File stage.
If the latter, you need to use a UV stage, and may need to handle the possibility that more than one row is returned by the lookup.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Paul Preston
Participant
Posts: 24
Joined: Wed Apr 02, 2003 7:09 am
Location: United Kingdom

Post by Paul Preston »

Ray

thanks for pointing out that UV tables don't have indexes unless they are manually created. I had assumed (incorrectly) that checking columns to specify the key when the UV table is created by a Datastage job would put an index on these columns. As the data volume grew the performance dropped (predictably if there is no index). I manually created an index using the unix command line interface and get a 10 fold perofrmance improvement in look ups that use this table.

This leads me to ask if the index is maintained when the UV table data is refreshed? I assume that Universe keeps the index in step with the base table? If it were an Oracle table there may be issues of overflow chains and unrecovered space; do any such issues exist with UV tables and their indexes?

Paul Preston
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's all good news, Paul! [:)]

Updates to the index are automatic when the base table is updated. (This can be disabled, for example for bulk loading of the base table, but the default condition is automatic update.)

The index is a B-tree (what UniVerse terms a Type 25 file), with up to 384 branches per node (so it's a fairly flat structure). Space allocation is automatic and B-tree re-balancing (during which recovery of unused space will occur) is automatic.
Post Reply