Hash File Performance Problem

rsaliah · Post by **rsaliah** » Mon Feb 05, 2007 5:19 am

Guys,

DS version = 7.5.1.A
OS = SunOS 5.8

I could use some pointers for things to check. We have a typical environmental setup with a development/test server and a separate production box. The problem I have is that I've notice some difference with performance between the two servers. I'm not a hardware person but I'm assured that both servers are of similar specification with similar configuration so I would in my simple mind expect similar performance.

I've carried out some simple tests at times when the boxes are not being used to try and establish a baseline. I've created simple jobs to:

1. Read from sequential and writing to sequential
2. Read from sequential and writing to hash file
3. Read from sequential and writing to oci
4. Read from hash file and writing to sequential
5. Read from hash file and writing to hash file
6. Read from hash file and writing to oci

This did highlight a significant difference with any test involving hash files. I've rerun the tests a couple of times and ensured that there are no other significant processes running and the results were the same. Anything involving hashed files were at least 10x slower on the production server, all other tests were showing comparable results.

Would any one know if there are config parameters (server or DS) that we should check, or indeed any other suggestion would be great.

Thanks,
Regu.

ArndW · Post by **ArndW** » Mon Feb 05, 2007 5:26 am

My first thought would be to look at the file system on which the hashed file is performing slowly; is it the same basic type as on the other machine and is it busy from another process? The other

The configuration parameters for hashed files ni DataStage pertain mainly to factors involving concurrent use; changing these in an environment with few users will not make much, if any, difference.

By default hashed files are created as DYNAMIC (type 30) and with a minimum modulus of 1. If this is not the same on the both machines and/or the file contains a different amount of records you may see differences as well due to the dynamic reallocation of the file.

A speed difference of 10x is quite large. Are you certain that you have enabled (or disabled) hash file caching the same way in both jobs?

rsaliah · Post by **rsaliah** » Mon Feb 05, 2007 5:43 am

Thanks for the swift reply.

I've had it confirmed that for the duration of the tests there is no other processes running on either box. With regards to the file systems I've been told that they are setup the same, apart from the difference is size there is no other differences, though I wouldn't know how to verify that. Is there some specific questions I could ask the UNIX dudes?

With regards to the hash file on both servers they are created exactly the same, as it is the same job that I've imported on to both machines. The setting are the same defaults that you get when you drag a hash file stage on to a job and link it.

Thanks,
Regu.

chulett · Post by **chulett** » Mon Feb 05, 2007 7:12 am

rsaliah wrote:Is there some specific questions I could ask the UNIX dudes?

Not really. Explain your situation and have them monitor performance / usage over the course of both runs. That should help narrow down the issue.

It could be something like patch levels or a different version of some critical library... hard to say. I have seen similar issues in the past and good SAs were critical in figuring out the problem. For the record, I was having a '10x slower' issue writing XML and it turned out the version of Java on the Production box was a little older and handling garbage collection in a totally different manner than the dev box, resulting in a literal 10x speed decrease. Was glad to figure out that one.

ArndW · Post by **ArndW** » Mon Feb 05, 2007 7:24 am

Try "lsfs" on your system to see what file types you have.
Enter "$DSHOME/bin/smat -t"; the configuration options that have been changed from default are marked with "*"; are they the same on both system (actually, a simple diff on the $DSHOME/uvconfig files will work just as well, if not better).

rsaliah · Post by **rsaliah** » Mon Feb 05, 2007 8:46 am

Thanks Guys,

'lsfs' is an AIX command which I believe on Solaris is 'cat /etc/vfstab'. Have looked at the output of this there are some differences. For example on the slow box we have logging enabled and the file system type is 'ufs' on the other box the type is 'vxfs'. I've no idea what these mean or whether it would result in the effects that I'm getting but I can now speak to the SA's and have some specific questions.

I did check the DS parameters and there were some change from defaults but they were the same on both severs.

Thanks.
Regu.

rsaliah · Post by **rsaliah** » Mon Feb 05, 2007 11:25 am

This is really starting to confuse me.

I've been suspecting the differences in the file system setup as being the cause so I've simplified my test to try and narrow it down and provide some proof.

I now have two simple jobs.

1. Transform generating 9 columns of data writing directly to a hash file. The output has been limit to 5 million rows and all the options are set to default. This runs at 5834 rows/sec on Dev and 487 rows/sec on our production environment.

2. Transform generating 9 columns of data writing directly to a Sequential file. The output has been limit to 5 million rows and all the options are set to default. This runs at 26596 rows/sec on Dev and 23697 rows/sec on our production environment.

I think that this rules out the file system as being the culprit because the difference on the sequential file test is negligible. The problem is I've now run out of ideas as to what else to check. If any onehas any suggestions it'll be very much appreciated.

Cheers,
Regu.

ArndW · Post by **ArndW** » Mon Feb 05, 2007 11:33 am

Regu,

the file system isn't ruled out completely, I would test the hashed file write on both a ufs and vxfs partition to make sure.

One glaring difference is that sometimes on certain file systems UNIX sequential files are sparse, i.e. if you write to a new file to position 1,000,000 it will not immediately allocate all the file's disk records.

ray.wurlod · Post by **ray.wurlod** » Mon Feb 05, 2007 1:26 pm

Is hashed file write cache enabled on one system and not on the other?

Have you tried pre-sizing the hashed file - creating with a non-default MINIMUM.MODULUS value?

Are you using rows/sec as your metric? Please report the run time of the Transformer stage (rather than the run time for the job) for each test.

rsaliah · Post by **rsaliah** » Tue Feb 06, 2007 11:32 am

Sorry guys I haven't coughed up for the full-on membership so couldn't see all the suggestion but thanks very much for your input.

For your information the file system on the prod box was setup with Oracle in mind with "forcedirectio" option set. This is what was causing the dramatic slowdown. When we switch that off we had the same performance as we were seeing on Dev (x10+ faster). We also tried it with the logging option off and got a little bit more speed but not enough to take the hit with the added risk of switching it off.

Thanks a lot for your input. :D

Regu.

ray.wurlod · Post by **ray.wurlod** » Tue Feb 06, 2007 3:48 pm

I guess you could mark it as resolved, then.

You can effect the same behaviour in DataStage BASIC, for files opened for sequential access (but not for hashed file) using statements like NoBuf and WriteSeqF.