What is advantage of taking data from table into hash file?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
akash_nitj
Participant
Posts: 27
Joined: Fri Aug 13, 2004 3:36 am
Location: INDIA

What is advantage of taking data from table into hash file?

Post by akash_nitj »

Hi All
What is the advantage of getting all the data of from table into hash file and then using the output of this hash file as input to some other stage such that data fetched from table varies with each record

TIA
akash
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

From the top of my head, the purpose of such design can be to
1.) remove duplicates - on specific order of aging
2.) use multi-valued fields option
3.) merging multiple sources and / or source systems
4.) history management using hash-file

Other than this, storing in hash file (compared to seq file) will not be much different.

Note - above answer is assuming that it is a stream output from the hash-file.
Paul Preston
Participant
Posts: 24
Joined: Wed Apr 02, 2003 7:09 am
Location: United Kingdom

Post by Paul Preston »

Akash

I am sure this is covered in numerous previous posts.

However, it is always worth repeating.

In addition to what has already been mentioned, a table is usually copied to a hash file for performance advantages when doing lookups.

If a large number of look ups are required on a dataset that is small to moderate in size then copying the table to a hash file will give significant performance advantes. The only time one would not want to do this is if the database table is really so large that it won't fit onto the device you have available to locate the hash file or it is so large that the time to create the hash file is longer than the time required for the lookups to run directly from the database table. So you need to have an idea of how many lookups are needed when you design the job and balance the trade off between hash file loading and improved lookup performance. The other time to avoid using a hash file is if other processes over which you have no control are changing data in the table, in which case you need to think through what the business is trying to achieve.

If you are going to look up on the database table be sure it is indexed so that your lookup is fast.

If you use a hash file then choosing the correct hashing algorithm and chaching options are important for data integrity and performance. Hash file creation strategies are a part of any basic Datstage training.

The use of hash files is always worth investigating because there is aso much you can do with them.

Paul
Post Reply