Loading Hash File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sumitgulati
Participant
Posts: 197
Joined: Mon Feb 17, 2003 11:20 pm
Location: India

Loading Hash File

Post by sumitgulati »

Hi All,

I have a table that has 6000000 (6 million) records. I am loading this table into a hash file to eliminate duplicate records. And then from the hash file I am loading the target table.
SrcTbl --> HashFile --> TgtTbl
The hash file type is Type 30(Dynamic). The problem is that the job got stuck after processing 4996953 records. It shows that it is processing records at the rate of 82 rows/sec but the total number of records processed does not increase.

Any idea what could be the problem.

Regards,
Sumit
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: Loading Hash File

Post by ogmios »

Without more information no, maybe your hashfile is "resizing".

But maybe you can consider using distinct or a group by in the source database to remove duplicates?

Ogmios
sumitgulati
Participant
Posts: 197
Joined: Mon Feb 17, 2003 11:20 pm
Location: India

Post by sumitgulati »

Putting distinct in the source query is taking a lot if time. The query runs for hours because it has to get distinct records from a lot of 6 mi records. Moreover we may have two distinct records with same combination of column values that are difined as keys in target table.

Regards,
Sumit
sumitgulati
Participant
Posts: 197
Joined: Mon Feb 17, 2003 11:20 pm
Location: India

Post by sumitgulati »

How reliable is the statistics that you get to see when you monitor a job? May be the job is still processing records in the back ground but it is not getting reflected in the Monitor screen of the job. Is it possible?????

Please advise.

-Sumit
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

sumitgulati wrote:Putting distinct in the source query is taking a lot if time. The query runs for hours because it has to get distinct records from a lot of 6 mi records. Moreover we may have two distinct records with same combination of column values that are difined as keys in target table.

Regards,
Sumit
Selecting 6 million records from the database and writing them into a hashfile is going to be instantaneous :lol: (hint: define a good index and your distinct will fly).

About your last problem this can be solved in SQL by "grouping by" on the key values and than tacking on the extra data that needs to be moved in 1 min or max function.

Anyway your solution also sounds tacky... The last row processed for a certain key is the one that will appear in your target.

Ogmios
sumitgulati
Participant
Posts: 197
Joined: Mon Feb 17, 2003 11:20 pm
Location: India

Post by sumitgulati »

Thanks Ogmios, I enforced the key columns uniqueness in the source query itself and knocked out the hash file from the job. The job is performing much better now.

Regards,
Sumit
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

When you have a hash file that large you need a minimum modulo. Do a search this has been covered a lot. Usually if a hash file hangs it is not resizing. Dynamic hash files split one group of records into 2 groups to resize. So as the modulo grows from 1 to several hundred thousand then you had lots of splits which one group of records was written out twice as it split. Not very effecient. When a minimum modulo then you eliminate lots of these splits. At TCL run:

ANALYZE.FILE HashFileName

This will tell you the current modulo. Also when a hash file is this large then use the 64bit option. This will allow the hash file to grow very large. Do not use this option on all your hash files. There is some overhead for this option.
Mamu Kim
sumitgulati
Participant
Posts: 197
Joined: Mon Feb 17, 2003 11:20 pm
Location: India

Post by sumitgulati »

Thanks Kim Duke, I will try this option also.

Regards,
Sumit
Deepak_J
Participant
Posts: 7
Joined: Tue Mar 23, 2004 9:28 am

Post by Deepak_J »

Summit,
It is possible your hash file might be hitting 2GB limit.

Deepak
JDionne
Participant
Posts: 342
Joined: Wed Aug 27, 2003 1:06 pm

Post by JDionne »

Deepak_J wrote:Summit,
It is possible your hash file might be hitting 2GB limit.

Deepak
There is a hashfile size limit?????
what happens after 2 gig?
Jim
Sure I need help....But who dosent?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

32-bit addressable files have a 2.2 gigagbyte limit. DS Server by default uses 32-bit hash files. You can set DS to use 64-bit as the default, or, manually create hash files supplying the 64BIT keyword option. Search the forum, it's covered a lot.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
sonia jacob
Participant
Posts: 122
Joined: Mon Jul 05, 2004 1:33 pm
Location: MA

Post by sonia jacob »

What about using an aggregator stage to remove duplicates?
JDionne
Participant
Posts: 342
Joined: Wed Aug 27, 2003 1:06 pm

Post by JDionne »

sonia jacob wrote:What about using an aggregator stage to remove duplicates?
The only problem with the aggrigator is that it is slow. It does preform a crude group by if that is what you need. I would still suggest doing the group by on the db server.

Jim
Sure I need help....But who dosent?
martin
Participant
Posts: 67
Joined: Fri Jul 30, 2004 7:19 am
Location: NewJersy

Post by martin »

KCBLAND: 32-bit addressable files have a 2.2 gigagbyte limit. DS Server by default uses 32-bit hash files. You can set DS to use 64-bit as the default, or, manually create hash files supplying the 64BIT keyword option. Search the forum, it's covered a lot.


How To set 64 Bit As Default

Thanks In Advance
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Search the forum, we've covered this before. It's in the uvconfig file, so requires a uvregen and is a server wide setting, affecting all projects.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply