Job Performance Issue

sumeet · Post by **sumeet** » Mon Jun 14, 2010 10:04 am

Hi All,

We recently bought DS 8.1 which has been installed on T4000 (Sun) box which has 8 dual core CPUs with 64 GB of Memory.

For each installation of DS the Admin created 1 node Config file whose Fastname is same as the server name.

{
node "node1"
{
fastname "Cont1"
pools ""
resource disk "/opt/ibm/IS/Server/Datasets" {pools ""}
resource scratchdisk "/opt/ibm/IS/Server/Scratch" {pools ""}
}
}

We ran a basic Job which copies data from One Oracle EE stage to Another Oracle EE stage with a simple select. The performance - 2600 Rows/sec.

Oracle EE --> Copy --> Oracle EE.

The Admin claims that this is among the best performance he has seen for similar job which is something we cannot digest.

We insisted that please create another config file with multiple node. He said that it wont improve the performance because only thing that changes in config file is the resource disk/scratch disk.

The 2 node file he created -

{
node "node1"
{
fastname "Cont1"
pools ""
resource disk "/opt/IBM/IS/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/IS/Server/Scratch" {pools ""}
}
node "node2"
{
fastname "Cont1"
pools ""
resource disk "/opt/IBM/IS/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/IS/Server/Scratch" {pools ""}
}
}

Is this correct? Every thing looks same for both the nodes.

Is this the limit of DS performance - 2600 rows/sec?

We are moving from Informatica to Datastage and the grounds for buying DS was performance improvement but Infa seems to do better.

Is it worth converting the job? Do we need to involve IBM here?

We would appreciate any answer.

Thanks
Sumeet

nagarjuna · Post by **nagarjuna** » Mon Jun 14, 2010 12:21 pm

Number of rows you are processing depends on the type of query you are using , number of columns you are accessing , number of tables that are present in the query ...whether you are using partition option ..and many more factors ...

sumeet · Post by **sumeet** » Mon Jun 14, 2010 3:33 pm

Thanks nagarjuna for your reply.

The query which we are running is very simple -

select col1, col2,col3, col4, col5, col6 from tablea where rownum < 5000000.

I assume with one node file the type of partition wont matter.

Use of two node (LOGICAL) file definitely improved the performance. But How do I understand how many CPU and how much memory is the process using.

I used $APT_DUMP_SCORE which gave 2 nodes are being used by 6 processes. Is there any other way to get more detailed information regarding hardware used by parallel engine.

Thanks
Sumeet

priyadarshikunal · Post by **priyadarshikunal** » Tue Jun 15, 2010 4:23 am

I have seen much better performance on a smaller box.

You can use resource estimator to estimate the resources used.

ray.wurlod · Post by **ray.wurlod** » Tue Jun 15, 2010 5:01 am

The limiting factor is probably, and curiously perhaps, the SELECT operation, which is performed sequentially. Under appropriate circumstances I have seen over 100,000 rows/second but, then, I believe this metric to be meaningless for most purposes.