Hash File Performance Degrading ??

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Vinodanand
Premium Member
Premium Member
Posts: 112
Joined: Mon Jul 11, 2005 7:54 am

Hash File Performance Degrading ??

Post by Vinodanand »

Hi All,

I have a server job where in the data is ported from a sequential file to a hash file . The record count is 2.4 million. Though this is a straight move to the hash I see that some times the number of rows read per second starts with 3456 but gradually comes down to 335.I found this when I tried to monitor the job. The number of columns are 67.

Regards,
Vinod
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's hashed file, not hash file.

Performance is fine. It's the meaningless figures that are the issue here. Ignore them. Data flow rapidly into the write cache. Then, while no more rows are flowing but the cache is being dumped to disk, the clock keeps running, so that the rows/sec figure keeps falling. Ignore it.

And even if you are not using write cache, as the size of the hashed file on disks grows and you are writing to random groups, the duration of the seek between the random pages in the structure gets longer and longer, reducing the total productive time available for writing data.
Last edited by ray.wurlod on Mon Aug 20, 2007 3:39 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kris
Participant
Posts: 160
Joined: Tue Dec 09, 2003 2:45 pm
Location: virginia, usa

Re: Hash File Performance Degrading ??

Post by kris »

Vinodanand wrote: the number of rows read per second starts with 3456 but gradually comes down to 335.I found this when I tried to monitor the job. The number of columns are 67.



67 columns in a hashed file? Why and how do you plan to use that hashed file?

There could be a fair amount of chance that your file size exceeded 2GB (32 bit DS server) and the throughput completely went down as it exceeded the limit.

There are very good articles on the best use of hashed files in this portal somewhere. Search for them.

There are several posts on this issue either, found one myself see if that helps you:
http://dsxchange.com/viewtopic.php?t=10 ... 006ccfe017



Best regards,
~Kris
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Also check to make sure that you have declared your hashed file as a DYNAMIC one, and/or ensured that the starting minimum modulo is sufficiently high in the case of a dynamic file or large enough to hold the expected data volume without overflowing too badly in the case of a static hashed file.
Vinodanand
Premium Member
Premium Member
Posts: 112
Joined: Mon Jul 11, 2005 7:54 am

Post by Vinodanand »

Hi ,

I am reading from a flat file and wirting it directly to a hash file. As teh file size reaches Seven hundred thousand I can see the write becoming really slow.Its like the First Seven hundred thousand rows get written in 8 mins and the remaining data out of a total 2.4 million takes 3 hours.I also tried increasing the minimum modulus,still it is kiiling . Do i need to change the file type or create enable teh caching attributes(write defferred).

Thanks,
Vinod
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You haven't told what you have set the file type to, so it is impossible to recommend changing it. The cacheing/buffering attributes won't make a performance difference in this case.

If you have a dynamice file, let the job run to completion, then check the modulus of the file and reset the MINIMUM.MODULUS to that value; that way you will avoid the dynamic resizing when writing to the file.

You haven't been reading what Ray is writing about the rows/second display in the job. Due to the way buffering works, the numbers displayed there are skewed. If you really wish to see the actual write speed, try doing a COUNT {filename} every couple of minutes to see actual performance.

Effective over the whole run, how my Kb/Second are you writing?
Vinodanand
Premium Member
Premium Member
Posts: 112
Joined: Mon Jul 11, 2005 7:54 am

Post by Vinodanand »

Hi Arnd,

I tried using the COUNT{filename} command in the Administrator was it threw a syntax error.Also how would i be able to check teh minimum modulus of teh file after it has run.

Regards,
Vinod
Vinodanand
Premium Member
Premium Member
Posts: 112
Joined: Mon Jul 11, 2005 7:54 am

Post by Vinodanand »

I missed out on this.I am using T30 Dynamic Hash file with default settings.Also is there a way I can the HFC.exe as I found it in one of teh posts but the link was invalid.

Thanks,
Vinod
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Don't you think that telling us what the error you received on the COUNT command might help? The HFC.exe is (I think) on the installation CD-ROM. But in your case it might just confuse matters. Wait until one load is finished, then do an "ANALYZE.FILE {YourFileName}" and post the results.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

HFC is indeed on the installation CD (DataStage clients), in the Utilities\Unsupported folder.

However, as Arnd says, don't concern yourself with this yet. Get the default hashed file populated, then see what tuning might be possible.

And please start calling them hashed files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Vinodanand
Premium Member
Premium Member
Posts: 112
Joined: Mon Jul 11, 2005 7:54 am

Post by Vinodanand »

Hi Arnd,

When I keyed in the command "ANALYZE.FILE {Hsh_PDSPersCovgPrevHist_coal_13058}" in DS Administrator I got the following error.

Verb "ANALYZE.FILE" is not in your VOC.

Regards,
Vinod
Vinodanand
Premium Member
Premium Member
Posts: 112
Joined: Mon Jul 11, 2005 7:54 am

Exceeded 2GB limit

Post by Vinodanand »

Hi All,

I just ran into a another problem . My hash file size exceeded 2GB and I created a 64 bit hash file,now and its running slower than before. The following was the command I used to create

/dsadm/Ascential/DataStage/DSEngine/bin/mkdbfile /dso/dsoweb/clients/jpmg/eligibility/logs/Hsh_PDSPersCovgPrevHist_coal_13235 30 678494 8 20 20 80 1628 -64bit.

Its a Type 30 file and when I summed up the Data 30 and Over 30 it was 2.8 GB. As I have 67 Columns,I think the number of columns is what is killing. The approximate record size is 700 bytes.

-rw-rw-r-- 1 dsosit chs 289136640 Aug 22 17:13 OVER.30
-rw-rw-r-- 1 dsosit chs 2779115520 Aug 22 17:13 DATA.30

The job design is

FlatFile --> transformer--> Hash File. No transformations in teh transformer.

Also When I look up this file it takes 6 hours as my incoming source is 2 million.

It would be great if anyone can help me out.

Regards,
Vinod
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Don't enter the '{' and '}' characters. Every VOC has a 'ANALYZE.FILE' command, but I often mistype ANALZYE.FILE and get the same error you did.
64Bit files are going to be slower (and have more storage overhead) than the original files.
Please do the ANALYZE.FILE Hsh_PDSPersCovgPrevHist_coal_13058 and COUNT Hsh_PDSPersCovgPrevHist_coal_13058 on your {original 32 bit} file and post the results. If those commands don't work you have have serious issues in your account and performance is the least of them.
Vinodanand
Premium Member
Premium Member
Posts: 112
Joined: Mon Jul 11, 2005 7:54 am

Post by Vinodanand »

Hi Arnd,

I get the following error when I keyed in the command .

COUNT Hsh_PDSPersCovgPrevHist_coal_13058

syntax error . unexpected sentence without file name.Token was "".Scanned Command was COUNT 'Hsh_PDSPersCovgPrevHist_coal_13058'

and for command

ANALYZE.FILE Hsh_PDSPersCovgPrevHist_coal_13058

Illegal option in command line.
Please help.

My 32 bit file had the same settings as the 64 bit .I changed teh file to 64 bit as my hash grew over 2.2 GB.Please help.
[/img][/list]
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

It seems that you are using your hashed file with a path in your job, which is why there is no VOC entry. You will need to issue a
"SETFILE /dso/dsoweb/clients/jpmg/eligibility/logs/Hsh_PDSPersCovgPrevHist_coal_13058 TESTFILE" then do a "ANALYZE.FILE TESTFILE"
Post Reply