What is difference between Hashed Files and Lookup Files set

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nexus2me
Participant
Posts: 5
Joined: Mon Sep 10, 2007 10:23 pm

What is difference between Hashed Files and Lookup Files set

Post by nexus2me »

Hi,

Anyone can me tell What is difference between Hashed Files and Lookup Files
Set



with rgrd

Nexus
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

Hi!

Welcome to DSXchange!!


Difference 1:
The first does not exist in PX and the other does.

Apologies,

But I do not see why the two need ever be compared.
:?
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A Hashed File is only available in server jobs. It uses a hashing algorithm (without building an index) to determine the location of keys within its structure. It is not amenable to parallelism. The contents of a hashed file may be cached in memory when using the Hashed File stage to service a reference input link. New rows to be written to a hashed file may first be written to a memory cache, then flushed to disk. All writes to a hashed file using an existing key overwrite the previous row. Duplicate key values are not permitted.

A Lookup File Set is only available in parallel jobs. It uses an index (based on a hash table) to determine the location of keys within its structure. It is a parallel structure; it has its records spread over the processing nodes specified when it was created. The records in the Lookup File Set are loaded into a virtual Data Set before use, and the index is also loaded into memory. Duplicate key values are (optionally) permitted. If the option is not selected, duplicates are rejected when writing to the Lookup File Set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mcs@rajesh
Participant
Posts: 46
Joined: Mon Sep 24, 2007 12:37 am
Location: INDIA

Re: What is difference between Hashed Files and Lookup Files

Post by mcs@rajesh »

hi
hashed file: this is used in server job. It does the function just like the Dataset in Parallel job.
LUKUPfileset: it is used in parallel jobs to enhance the performance.
i dont think these two has ever been compared...
dwblore
Charter Member
Charter Member
Posts: 40
Joined: Tue Mar 28, 2006 12:02 am

Post by dwblore »

Hi,
Hash Files were in Server Job and is no more supported in PX.
In Server jobs writing a reference table in to a Hash file and reading (referencing) for lookup would result in better performance rather than looking up a table.
Post Reply