Use of Hashfiles in a parallel job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
avenki77
Participant
Posts: 25
Joined: Wed Jul 07, 2004 2:55 pm

Use of Hashfiles in a parallel job

Post by avenki77 »

Hi All,

I am developing a parallel job which need to perform a lookup on a huge database table. The lookup involves 2 key columns and 1 data column (about 30 bytes in total) and about 10 million rows?

What kind of lookup mechanism is advisable for this job? How can I decide whether to use a ODBC lookup stage or a hashfile in this job?

Also, first of all, can hashfiles be used in parallel jobs in datastage or is it not advisable/available?

Thanks in advance
Venkatesh
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Use of Hashfiles in a parallel job

Post by chulett »

avenki77 wrote:Also, first of all, can hashfiles be used in parallel jobs in datastage or is it not advisable/available?
First of all, can they be used? Technically yes, in a Server Shared Container. Should they be used? No.

You've still got your Server Thinking Cap on. It needs to go back in the closet and you need to approach these problems with a different mindset. There are specific 'lookup' stages in PX you should be using - Join, Merge, Lookup - all of which are discussed in the Parallel Job Developer's Guide pdf.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is no such thing as a hash file in DataStage. A hashed file is a popular way to store lookup reference data in server jobs. They should not be used in parallel jobs, as to do so will thwart the automatic scaling capability of these jobs.

Stop thinking like a server job developer and investigate parallel alternatives such Lookup File Sets or normal lookups via a Lookup stage. Or Join stage or Merge stage where appropriate.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply