DataStage performances degrade with an Oracle Table.

Umbix62 · Post by **Umbix62** » Mon Apr 10, 2006 9:22 am

Hi

I need of an help about this question. I'am working for a Customer asking me to create a Job Server with DataStage 7.1 to read a flow of data from an Oracle Table and to insert the rows extrected into an other Oracle table. Both tables are partitioned, but the customer for me has committed a mistake. She choose as partitioned key of the target table a varchar column. I know it is not a good idea.

Now the Jobs has not a good performance and she think that it is for DataStage. I think, instead, that the problem is tha table layout.

Yesterday this table had another layout. The partitioned key was a field declared as a decimal and the Job worked in ten minutes. Now that.

May anybody tell me if I'm wrong or not?

Thank you very much

Umberto

kcbland · Post by **kcbland** » Mon Apr 10, 2006 9:37 am

If the table yesterday was partitioned one way, and today is another, obviously the data had to be ported from the old copy of the table because the partitioning key has changed.

That being said, the new method for partitioning may have shifted the data around differently. Comparing yesterdays loads to today is not valid. If the partitions yesterday were 32 and today there are 64, the data loads will greatly change in their performance.

You haven't said how you are loading the data. If you are bulkloading it makes a difference from OCI loads.

What about the indexes? Locally managed versus global have HUGE loading impact. If you're doing OCI loads, global indexes are not as troublesome as with DIRECT path partitioned loads. Maybe the indexes changed as well with the new partitioning scheme, which can affect load performance.

Consider giving more information, as I can't talk about every situation possible. This post borders on a Premium Answer, for your specific situation I'll give an answer. A general data modeling and architecture answer will be a Premium Answer.

Umbix62 · Post by **Umbix62** » Mon Apr 10, 2006 10:00 am

kcbland wrote:If the table yesterday was partitioned one way, and today is another, obviously the data had to be ported from the old copy of the table because the partitioning key has changed.

That being said, the new method for partitioning may have shifted the data around differently. Comparing yesterdays loads to today is not valid. If the partitions yesterday were 32 and today there are 64, the data loads will greatly change in their performance.

You haven't said how you are loading the data. If you are bulkloading it makes a difference from OCI loads.

What about the indexes? Locally managed versus global have HUGE loading impact. If you're doing OCI loads, global indexes are not as troublesome as with DIRECT path partitioned loads. Maybe the indexes changed as well with the new partitioning scheme, which can affect load performance.

Consider giving more information, as I can't talk about every situation possible. This post borders on a Premium Answer, for your specific situation I'll give an answer. A general data modeling and architecture answer will be a Premium Answer.

The customer don't use the bulk stage, never. I dont' know why. I suggested it but she refused. She prefers to insert the rows without clearing the table directly via OCI Oracle8i Stage (She does a three days course only and she never worked with DataStage Server, but SHE "knows" how to work with it, obviously).

The table as the same number of partions. She chanced only the attribute used as partioned key but not the number of partitions.

Instead she reduced the number of indexes, six yesterday, only one now.

It's not easy for me to speak with her. She is a "muy complicado" person.

The job uses only two stage, both OCI Oracle8i Stage. The first is used to read the data, the extraction phase is very quickly, and the second insert the data without clearing tha target table. The jobs it's the same. I did't change it (She "suggests" to do in this way).

I dont' know i respond to her. Now she wants to prove an oracle PL/SQL procedure because, according to her, it's better and more quickly., but I suppose we'll have the same perfomance of the Job.

Thank you very much for the help.

Umberto

kcbland · Post by **kcbland** » Mon Apr 10, 2006 10:07 am

Create two jobs: OCI-->XFM-->SEQ and SEQ-->XFM-->OCI. You'll see how fast the data spools, and how fast the data inserts.

As for the PL/SQL argument, she's lost it already. It would be faster to use straight DML, because of parallel query processing capability. "INSERT INTO target.table (...) SELECT (...) from source.table" has the fastest processing if you're staying within the same instance.

You need to understand that ETL is more than just copying rows. Data transformation, cleanup, auditing, etc is part of the justification for an ETL tool.

rleishman · Post by **rleishman** » Mon Apr 10, 2006 10:05 pm

What is the UPDATE ACTION on the OCI stage? If it is "Insert New Rows Only" - no problem. If it is "Insert new rows or Update existing" or "Update existing rows or insert new" then you will have terrible performance problems - you need to split your inserts and updates into separate links.

I find it highly unlikely that the choice of partition column is the cause of the performance problem. It is much more likely to be index-related.

Umbix62 · Post by **Umbix62** » Tue Apr 11, 2006 12:54 am

kcbland wrote:Create two jobs: OCI-->XFM-->SEQ and SEQ-->XFM-->OCI. You'll see how fast the data spools, and how fast the data inserts.

As for the PL/SQL argument, she's lost it already. It would be faster to use straight DML, because of parallel query processing capability. "INSERT INTO target.table (...) SELECT (...) from source.table" has the fastest processing if you're staying within the same instance.

You need to understand that ETL is more than just copying rows. Data transformation, cleanup, auditing, etc is part of the justification for an ETL tool.

I want to try the way suggested from you, create a job to extract the data and spool them into a flat file and after read it from the flat file and insert into the target table. But i can't do it now.

Anyway she is trying to execute the plsql procedure and after 30 minutes it 'is still running.....

I know what you say. the problem it is that tha customer doesn't know that.

Thank you very much

Umberto