Page 1 of 1

Speed up ETL Process in Datastage job

Posted: Sun May 30, 2010 9:57 pm
by caterin_huang
Hi, is there any tips to speed up an extract and loading process in datastage using Oracle database as source /target? Is the load method at oracle Enterprise stage equals to SQLLoader? And is append method in Oracle Enterprise stage equals to append DML in Oracle?

for your information, my table is partitioned and have indexes. Will it slow down the loading data process?


Thanks in advance,
Caterin

Re: Speed up ETL Process in Datastage job

Posted: Mon May 31, 2010 2:49 am
by g_rkrish
caterin_huang wrote:Hi, is there any tips to speed up an extract and loading process in datastage using Oracle database as source /target? Is the load method at oracle Enterprise stage equals to SQLLoader? And is append method in Oracle Enterprise stage equals to append DML in Oracle?

for your information, my table is partitioned and have indexes. Will it slow down the loading data process?


Thanks in advance,
Caterin
Try disabling the indexes,constraints etc and load them and enable them back provoided you checked the duplicates before

Posted: Mon May 31, 2010 2:59 am
by ray.wurlod
The local manager of VMARK (first vendor of DataStage) here was fond of overclocking the CPU, much to the annoyance of the IT people who had to replace his fried CPU from time to time, but it did go faster!

Systematically investigate your design to determine whether you're doing anything you don't need to do, both within DataStage and on its boundaries to data.

Posted: Mon May 31, 2010 2:59 am
by ArndW
Does your job partition the data streams according to the DB partitioning scheme and load appropriately?

Re: Speed up ETL Process in Datastage job

Posted: Mon May 31, 2010 3:37 am
by caterin_huang
g_rkrish wrote:
caterin_huang wrote:Hi, is there any tips to speed up an extract and loading process in datastage using Oracle database as source /target? Is the load method at oracle Enterprise stage equals to SQLLoader? And is append method in Oracle Enterprise stage equals to append DML in Oracle?

for your information, my table is partitioned and have indexes. Will it slow down the loading data process?


Thanks in advance,
Caterin
Try disabling the indexes,constraints etc and load them and enable them back provoided you checked the duplicates before
Hi, thanks for answering.. :) hmm, if I disabling the indexes and rebuild it, when will it be rebuild? is it happen when the job running?

Posted: Mon May 31, 2010 3:38 am
by caterin_huang
ArndW wrote:Does your job partition the data streams according to the DB partitioning scheme and load appropriately? ...
hmm, what do you mean by DB partitioning scheme? I'm sorry, because I'm new to datastage :)

Posted: Mon May 31, 2010 4:06 am
by ArndW
If your DataStage partitioning is identical to your database partitioning then, when loading to the database, each process only needs to load to one partition and is therefore much faster and more efficient than otherwise. This is not necessarily DataStage but general database and, in this case, Oracle specific. You stated that your table is partitioned and indexed, so I assumed you meant that the database is partitioned and if you are not sure you can ask your DBA for confirmation and details.

Posted: Mon May 31, 2010 6:53 am
by chulett
Exactly. Your questions are pretty common when starting off and are all related to loading Oracle data regardless of the tool being used. Talk to your DBA (or co-workers familiar with the subject) and see what they suggest. Then people here can help with the DataStage specific implementation details.

Re: Speed up ETL Process in Datastage job

Posted: Mon May 31, 2010 10:06 pm
by g_rkrish
Have a shellscript in which call the sql for drop those constrints and indexes after the job finishes try rebuliding them using a shell script.Use before after subroutine if you use at job level,use rotuine activity if you use them at the sequencer level.

Posted: Mon May 31, 2010 11:04 pm
by videsh77
Another possibility to improve the load performace, but not using the DataStage, which we used on database other than Oracle.

Have your Load files exported per partition.
As others recommended drop the constraints & indexes on the table, this can happen at the same time when your load files are being written.
In the next script, load your files using bulkloader utility in Oracle (Please check with your DBA).
Rebuild indexes & constraints.

Benefit here is, you are not engaging DS Engine for such loads.
As you would have load files ready with you, it is going to be easy for the maintenance, in case any data problem arises.

Let us know, if it helps ...

Posted: Tue Jun 01, 2010 6:36 am
by chulett
Be careful with all of the "drop all indexes and constraints" suggestions. Not always as easy as it sounds (RI) and can be problematic if any other process is accessing your table while it is being loaded. Never mind the fact that it is not guaranteed to be any kind of faster overall, you'd need to test to confirm or deny that.

Posted: Wed Jun 02, 2010 1:31 am
by caterin_huang
videsh77 wrote:Another possibility to improve the load performace, but not using the DataStage, which we used on database other than Oracle.

Have your Load files exported per partition.
As others recommended drop the constraints & indexes on the table, this can happen at the same time when your load files are being written.
In the next script, load your files using bulkloader utility in Oracle (Please check with your DBA).
Rebuild indexes & constraints.

Benefit here is, you are not engaging DS Engine for such loads.
As you would have load files ready with you, it is going to be easy for the maintenance, in case any data problem arises.

Let us know, if it helps ...
Yes, the database is partitioned. hmm, if I export the files per partition, how can we make datastage to read all the files? Should I make another job for one partition text file?

btw, can we user oracle parallel or hint insert in datastage?

Thanks,
Caterin

Posted: Wed Jun 02, 2010 12:21 pm
by DSguru2B
Nothing special. As ArndW noted before, partition your input stream identical to the database partitioning. Let datastage do the rest.
If this is a one time deal so as to load historical data, I have involved the dba who drops all the indexes for me and turns off logging. That helps a lot. But if this is an ongoing thing then you need to nail your design.