Speed up ETL Process in Datastage job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
caterin_huang
Participant
Posts: 48
Joined: Tue Jan 26, 2010 4:05 am
Location: Indonesia

Speed up ETL Process in Datastage job

Post by caterin_huang »

Hi, is there any tips to speed up an extract and loading process in datastage using Oracle database as source /target? Is the load method at oracle Enterprise stage equals to SQLLoader? And is append method in Oracle Enterprise stage equals to append DML in Oracle?

for your information, my table is partitioned and have indexes. Will it slow down the loading data process?


Thanks in advance,
Caterin
g_rkrish
Participant
Posts: 264
Joined: Wed Feb 08, 2006 12:06 am

Re: Speed up ETL Process in Datastage job

Post by g_rkrish »

caterin_huang wrote:Hi, is there any tips to speed up an extract and loading process in datastage using Oracle database as source /target? Is the load method at oracle Enterprise stage equals to SQLLoader? And is append method in Oracle Enterprise stage equals to append DML in Oracle?

for your information, my table is partitioned and have indexes. Will it slow down the loading data process?


Thanks in advance,
Caterin
Try disabling the indexes,constraints etc and load them and enable them back provoided you checked the duplicates before
RK
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The local manager of VMARK (first vendor of DataStage) here was fond of overclocking the CPU, much to the annoyance of the IT people who had to replace his fried CPU from time to time, but it did go faster!

Systematically investigate your design to determine whether you're doing anything you don't need to do, both within DataStage and on its boundaries to data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Does your job partition the data streams according to the DB partitioning scheme and load appropriately?
caterin_huang
Participant
Posts: 48
Joined: Tue Jan 26, 2010 4:05 am
Location: Indonesia

Re: Speed up ETL Process in Datastage job

Post by caterin_huang »

g_rkrish wrote:
caterin_huang wrote:Hi, is there any tips to speed up an extract and loading process in datastage using Oracle database as source /target? Is the load method at oracle Enterprise stage equals to SQLLoader? And is append method in Oracle Enterprise stage equals to append DML in Oracle?

for your information, my table is partitioned and have indexes. Will it slow down the loading data process?


Thanks in advance,
Caterin
Try disabling the indexes,constraints etc and load them and enable them back provoided you checked the duplicates before
Hi, thanks for answering.. :) hmm, if I disabling the indexes and rebuild it, when will it be rebuild? is it happen when the job running?
caterin_huang
Participant
Posts: 48
Joined: Tue Jan 26, 2010 4:05 am
Location: Indonesia

Post by caterin_huang »

ArndW wrote:Does your job partition the data streams according to the DB partitioning scheme and load appropriately? ...
hmm, what do you mean by DB partitioning scheme? I'm sorry, because I'm new to datastage :)
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If your DataStage partitioning is identical to your database partitioning then, when loading to the database, each process only needs to load to one partition and is therefore much faster and more efficient than otherwise. This is not necessarily DataStage but general database and, in this case, Oracle specific. You stated that your table is partitioned and indexed, so I assumed you meant that the database is partitioned and if you are not sure you can ask your DBA for confirmation and details.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Exactly. Your questions are pretty common when starting off and are all related to loading Oracle data regardless of the tool being used. Talk to your DBA (or co-workers familiar with the subject) and see what they suggest. Then people here can help with the DataStage specific implementation details.
-craig

"You can never have too many knives" -- Logan Nine Fingers
g_rkrish
Participant
Posts: 264
Joined: Wed Feb 08, 2006 12:06 am

Re: Speed up ETL Process in Datastage job

Post by g_rkrish »

Have a shellscript in which call the sql for drop those constrints and indexes after the job finishes try rebuliding them using a shell script.Use before after subroutine if you use at job level,use rotuine activity if you use them at the sequencer level.
RK
videsh77
Premium Member
Premium Member
Posts: 97
Joined: Thu Dec 02, 2004 10:43 am
Contact:

Post by videsh77 »

Another possibility to improve the load performace, but not using the DataStage, which we used on database other than Oracle.

Have your Load files exported per partition.
As others recommended drop the constraints & indexes on the table, this can happen at the same time when your load files are being written.
In the next script, load your files using bulkloader utility in Oracle (Please check with your DBA).
Rebuild indexes & constraints.

Benefit here is, you are not engaging DS Engine for such loads.
As you would have load files ready with you, it is going to be easy for the maintenance, in case any data problem arises.

Let us know, if it helps ...
Thanks with regards,
videsh.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Be careful with all of the "drop all indexes and constraints" suggestions. Not always as easy as it sounds (RI) and can be problematic if any other process is accessing your table while it is being loaded. Never mind the fact that it is not guaranteed to be any kind of faster overall, you'd need to test to confirm or deny that.
Last edited by chulett on Wed Jun 02, 2010 5:57 am, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
caterin_huang
Participant
Posts: 48
Joined: Tue Jan 26, 2010 4:05 am
Location: Indonesia

Post by caterin_huang »

videsh77 wrote:Another possibility to improve the load performace, but not using the DataStage, which we used on database other than Oracle.

Have your Load files exported per partition.
As others recommended drop the constraints & indexes on the table, this can happen at the same time when your load files are being written.
In the next script, load your files using bulkloader utility in Oracle (Please check with your DBA).
Rebuild indexes & constraints.

Benefit here is, you are not engaging DS Engine for such loads.
As you would have load files ready with you, it is going to be easy for the maintenance, in case any data problem arises.

Let us know, if it helps ...
Yes, the database is partitioned. hmm, if I export the files per partition, how can we make datastage to read all the files? Should I make another job for one partition text file?

btw, can we user oracle parallel or hint insert in datastage?

Thanks,
Caterin
Last edited by caterin_huang on Wed Jun 02, 2010 1:36 am, edited 1 time in total.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Nothing special. As ArndW noted before, partition your input stream identical to the database partitioning. Let datastage do the rest.
If this is a one time deal so as to load historical data, I have involved the dba who drops all the indexes for me and turns off logging. That helps a lot. But if this is an ongoing thing then you need to nail your design.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply