Job Performance Issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sumeet
Premium Member
Premium Member
Posts: 54
Joined: Tue Aug 30, 2005 11:44 pm

Job Performance Issue

Post by sumeet »

Hi All,

We recently bought DS 8.1 which has been installed on T4000 (Sun) box which has 8 dual core CPUs with 64 GB of Memory.

For each installation of DS the Admin created 1 node Config file whose Fastname is same as the server name.

{
node "node1"
{
fastname "Cont1"
pools ""
resource disk "/opt/ibm/IS/Server/Datasets" {pools ""}
resource scratchdisk "/opt/ibm/IS/Server/Scratch" {pools ""}
}
}


We ran a basic Job which copies data from One Oracle EE stage to Another Oracle EE stage with a simple select. The performance - 2600 Rows/sec.

Oracle EE --> Copy --> Oracle EE.

The Admin claims that this is among the best performance he has seen for similar job which is something we cannot digest.

We insisted that please create another config file with multiple node. He said that it wont improve the performance because only thing that changes in config file is the resource disk/scratch disk.

The 2 node file he created -

{
node "node1"
{
fastname "Cont1"
pools ""
resource disk "/opt/IBM/IS/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/IS/Server/Scratch" {pools ""}
}
node "node2"
{
fastname "Cont1"
pools ""
resource disk "/opt/IBM/IS/Server/Datasets" {pools ""}
resource scratchdisk "/opt/IBM/IS/Server/Scratch" {pools ""}
}
}

Is this correct? Every thing looks same for both the nodes.

Is this the limit of DS performance - 2600 rows/sec?

We are moving from Informatica to Datastage and the grounds for buying DS was performance improvement but Infa seems to do better.

Is it worth converting the job? Do we need to involve IBM here?

We would appreciate any answer.

Thanks
Sumeet
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Number of rows you are processing depends on the type of query you are using , number of columns you are accessing , number of tables that are present in the query ...whether you are using partition option ..and many more factors ...
Nag
sumeet
Premium Member
Premium Member
Posts: 54
Joined: Tue Aug 30, 2005 11:44 pm

Post by sumeet »

Thanks nagarjuna for your reply.

The query which we are running is very simple -

select col1, col2,col3, col4, col5, col6 from tablea where rownum < 5000000.

I assume with one node file the type of partition wont matter.

Use of two node (LOGICAL) file definitely improved the performance. But How do I understand how many CPU and how much memory is the process using.

I used $APT_DUMP_SCORE which gave 2 nodes are being used by 6 processes. Is there any other way to get more detailed information regarding hardware used by parallel engine.


Thanks
Sumeet
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

I have seen much better performance on a smaller box.

You can use resource estimator to estimate the resources used.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The limiting factor is probably, and curiously perhaps, the SELECT operation, which is performed sequentially. Under appropriate circumstances I have seen over 100,000 rows/second but, then, I believe this metric to be meaningless for most purposes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply