Page 1 of 1

Use of Hashfiles in a parallel job

Posted: Fri Jul 13, 2007 9:19 am
by avenki77
Hi All,

I am developing a parallel job which need to perform a lookup on a huge database table. The lookup involves 2 key columns and 1 data column (about 30 bytes in total) and about 10 million rows?

What kind of lookup mechanism is advisable for this job? How can I decide whether to use a ODBC lookup stage or a hashfile in this job?

Also, first of all, can hashfiles be used in parallel jobs in datastage or is it not advisable/available?

Thanks in advance
Venkatesh

Re: Use of Hashfiles in a parallel job

Posted: Fri Jul 13, 2007 9:57 am
by chulett
avenki77 wrote:Also, first of all, can hashfiles be used in parallel jobs in datastage or is it not advisable/available?
First of all, can they be used? Technically yes, in a Server Shared Container. Should they be used? No.

You've still got your Server Thinking Cap on. It needs to go back in the closet and you need to approach these problems with a different mindset. There are specific 'lookup' stages in PX you should be using - Join, Merge, Lookup - all of which are discussed in the Parallel Job Developer's Guide pdf.

Posted: Fri Jul 13, 2007 3:31 pm
by ray.wurlod
There is no such thing as a hash file in DataStage. A hashed file is a popular way to store lookup reference data in server jobs. They should not be used in parallel jobs, as to do so will thwart the automatic scaling capability of these jobs.

Stop thinking like a server job developer and investigate parallel alternatives such Lookup File Sets or normal lookups via a Lookup stage. Or Join stage or Merge stage where appropriate.