Page 1 of 1

Performance issue with Hashfiles

Posted: Wed Sep 17, 2008 6:54 pm
by pradkumar
Hi,

I migrated the project from 7.5 version on HP Unix machine ( 2-CPU) to version 8.0.1 on AIX machine(1 cpu) and moved all the files to the new server. I didnt copy the hashfiles from old server to new server. what i did was i ran all the jobs where the hashfiles are created in the project directory of new server.
when i try to run the remaining jobs, the perfromance is very slow when compare to old server. For example some jobs in 7.5 version ends in 50mins and its taking more than 3 hrs to run in version 8. Even the number of rows per sec in 7.5 is 60 where as in version 8 is 5rows/sec .
I checked the cpu utilization and its almost 90 to 95% untilized. Is this is due to Number of CPUS? how can i increase the performance of the jobs(number of rows from hash file to transformer) ?? .

Any inputs would be really appreciated.

Thanks in Advance

Posted: Thu Sep 18, 2008 12:32 am
by tcj
Are they dynamic or static hashed files?

How many rows are being created in the hashed files?

I would guess that the hash files on the old 7.5 server have grown over time being dynamic type hashed files. The new hash files on version 8 will probably have been created from new with the default settings. The hashed file will start splitting as the hashed file grows which can cause major over heads.

Posted: Thu Sep 18, 2008 12:39 am
by tcj
Interesting whitepaper on the subject. I found this link from other post.

http://www.openqm.org/downloads/dynamic_files.pdf

This is the post I found it in. Take note Chulett post down the bottom.

viewtopic.php?t=109212&highlight=modulus

Re: Performance issue with Hashfiles

Posted: Thu Sep 18, 2008 1:09 am
by ray.wurlod
pradkumar wrote:Hhow can i increase the performance of the jobs(number of rows from hash file to transformer) ??
The only way that you can increase the number of rows is by increasing the number of rows (that is, by reading more rows).

The execution environment for version 8 is somewhat different to the execution environment for earlier versions, not least because of the fact that things are running in the Information Server domain. Doubtless there are overheads from this, particularly at the metadata management level.

A cynic might observe that they've put in some "slowdowns" for server jobs to encourage everyone to move to parallel jobs. Whether or not this is the case, IBM has been warning about the overheads of executing within the Information Server environment since version 8 was in beta testing. You might like to ask that question of your official support provider.