Hash File Performance Degrading ??

Vinodanand · Post by **Vinodanand** » Mon Aug 20, 2007 2:50 pm

Hi All,

I have a server job where in the data is ported from a sequential file to a hash file . The record count is 2.4 million. Though this is a straight move to the hash I see that some times the number of rows read per second starts with 3456 but gradually comes down to 335.I found this when I tried to monitor the job. The number of columns are 67.

Regards,
Vinod

ray.wurlod · Post by **ray.wurlod** » Mon Aug 20, 2007 3:34 pm

It's hashed file, not hash file.

Performance is fine. It's the meaningless figures that are the issue here. Ignore them. Data flow rapidly into the write cache. Then, while no more rows are flowing but the cache is being dumped to disk, the clock keeps running, so that the rows/sec figure keeps falling. Ignore it.

And even if you are not using write cache, as the size of the hashed file on disks grows and you are writing to random groups, the duration of the seek between the random pages in the structure gets longer and longer, reducing the total productive time available for writing data.

kris · Post by **kris** » Mon Aug 20, 2007 3:35 pm

Vinodanand wrote: the number of rows read per second starts with 3456 but gradually comes down to 335.I found this when I tried to monitor the job. The number of columns are 67.

67 columns in a hashed file? Why and how do you plan to use that hashed file?

There could be a fair amount of chance that your file size exceeded 2GB (32 bit DS server) and the throughput completely went down as it exceeded the limit.

There are very good articles on the best use of hashed files in this portal somewhere. Search for them.

There are several posts on this issue either, found one myself see if that helps you:
http://dsxchange.com/viewtopic.php?t=10 ... 006ccfe017

Best regards,

ArndW · Post by **ArndW** » Mon Aug 20, 2007 3:40 pm

Also check to make sure that you have declared your hashed file as a DYNAMIC one, and/or ensured that the starting minimum modulo is sufficiently high in the case of a dynamic file or large enough to hold the expected data volume without overflowing too badly in the case of a static hashed file.

Vinodanand · Post by **Vinodanand** » Tue Aug 21, 2007 1:45 pm

Hi ,

I am reading from a flat file and wirting it directly to a hash file. As teh file size reaches Seven hundred thousand I can see the write becoming really slow.Its like the First Seven hundred thousand rows get written in 8 mins and the remaining data out of a total 2.4 million takes 3 hours.I also tried increasing the minimum modulus,still it is kiiling . Do i need to change the file type or create enable teh caching attributes(write defferred).

Thanks,
Vinod

ArndW · Post by **ArndW** » Tue Aug 21, 2007 3:58 pm

You haven't told what you have set the file type to, so it is impossible to recommend changing it. The cacheing/buffering attributes won't make a performance difference in this case.

If you have a dynamice file, let the job run to completion, then check the modulus of the file and reset the MINIMUM.MODULUS to that value; that way you will avoid the dynamic resizing when writing to the file.

You haven't been reading what Ray is writing about the rows/second display in the job. Due to the way buffering works, the numbers displayed there are skewed. If you really wish to see the actual write speed, try doing a COUNT {filename} every couple of minutes to see actual performance.

Effective over the whole run, how my Kb/Second are you writing?

Vinodanand · Post by **Vinodanand** » Wed Aug 22, 2007 12:10 pm

Hi Arnd,

I tried using the COUNT{filename} command in the Administrator was it threw a syntax error.Also how would i be able to check teh minimum modulus of teh file after it has run.

Regards,
Vinod

Vinodanand · Post by **Vinodanand** » Wed Aug 22, 2007 12:11 pm

I missed out on this.I am using T30 Dynamic Hash file with default settings.Also is there a way I can the HFC.exe as I found it in one of teh posts but the link was invalid.

Thanks,
Vinod

ArndW · Post by **ArndW** » Wed Aug 22, 2007 3:40 pm

Don't you think that telling us what the error you received on the COUNT command might help? The HFC.exe is (I think) on the installation CD-ROM. But in your case it might just confuse matters. Wait until one load is finished, then do an "ANALYZE.FILE {YourFileName}" and post the results.

ray.wurlod · Post by **ray.wurlod** » Wed Aug 22, 2007 4:49 pm

HFC is indeed on the installation CD (DataStage clients), in the Utilities\Unsupported folder.

However, as Arnd says, don't concern yourself with this yet. Get the default hashed file populated, then see what tuning might be possible.

And please start calling them hashed files.

Vinodanand · Post by **Vinodanand** » Thu Aug 23, 2007 6:59 am

Hi Arnd,

When I keyed in the command "ANALYZE.FILE {Hsh_PDSPersCovgPrevHist_coal_13058}" in DS Administrator I got the following error.

Verb "ANALYZE.FILE" is not in your VOC.

Regards,
Vinod

Vinodanand · Post by **Vinodanand** » Thu Aug 23, 2007 2:26 pm

Hi All,

I just ran into a another problem . My hash file size exceeded 2GB and I created a 64 bit hash file,now and its running slower than before. The following was the command I used to create

/dsadm/Ascential/DataStage/DSEngine/bin/mkdbfile /dso/dsoweb/clients/jpmg/eligibility/logs/Hsh_PDSPersCovgPrevHist_coal_13235 30 678494 8 20 20 80 1628 -64bit.

Its a Type 30 file and when I summed up the Data 30 and Over 30 it was 2.8 GB. As I have 67 Columns,I think the number of columns is what is killing. The approximate record size is 700 bytes.

-rw-rw-r-- 1 dsosit chs 289136640 Aug 22 17:13 OVER.30
-rw-rw-r-- 1 dsosit chs 2779115520 Aug 22 17:13 DATA.30
The job design is

FlatFile --> transformer--> Hash File. No transformations in teh transformer.

Also When I look up this file it takes 6 hours as my incoming source is 2 million.

It would be great if anyone can help me out.

Regards,
Vinod

ArndW · Post by **ArndW** » Thu Aug 23, 2007 3:08 pm

Don't enter the '{' and '}' characters. Every VOC has a 'ANALYZE.FILE' command, but I often mistype ANALZYE.FILE and get the same error you did.
64Bit files are going to be slower (and have more storage overhead) than the original files.
Please do the ANALYZE.FILE Hsh_PDSPersCovgPrevHist_coal_13058 and COUNT Hsh_PDSPersCovgPrevHist_coal_13058 on your {original 32 bit} file and post the results. If those commands don't work you have have serious issues in your account and performance is the least of them.

Vinodanand · Post by **Vinodanand** » Thu Aug 23, 2007 4:41 pm

Hi Arnd,

I get the following error when I keyed in the command .

COUNT Hsh_PDSPersCovgPrevHist_coal_13058

syntax error . unexpected sentence without file name.Token was "".Scanned Command was COUNT 'Hsh_PDSPersCovgPrevHist_coal_13058'

and for command

ANALYZE.FILE Hsh_PDSPersCovgPrevHist_coal_13058

Illegal option in command line.
Please help.

My 32 bit file had the same settings as the 64 bit .I changed teh file to 64 bit as my hash grew over 2.2 GB.Please help.
[/img][/list]

ArndW · Post by **ArndW** » Thu Aug 23, 2007 4:45 pm

It seems that you are using your hashed file with a path in your job, which is why there is no VOC entry. You will need to issue a
"SETFILE /dso/dsoweb/clients/jpmg/eligibility/logs/Hsh_PDSPersCovgPrevHist_coal_13058 TESTFILE" then do a "ANALYZE.FILE TESTFILE"

DSXchange

Hash File Performance Degrading ??

Hash File Performance Degrading ??

Re: Hash File Performance Degrading ??

Exceeded 2GB limit