DS Tuning Options

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
parag_pk
Participant
Posts: 12
Joined: Fri Feb 18, 2005 1:41 pm

DS Tuning Options

Post by parag_pk »

Hi,

I created a job. which is very simple.

1. Read EBCDIC file --> Transform --> Insert/Update Table.
We are loading around - 16300 Records.
Right now it is taking around 50 Sec.

2. After this I did this
Read EBCDIC file --> Transform --> IPC --> Insert/Update Table.
This takes around - 40 Sec.

3. Read EBCDIC file --> Link Partition [ Round Robin ] Transform --> IPC --> Insert/Update Table.
I did around 6 Partition. This takes around -30 Sec.

I want the performance LESS than 20 sec. My Oracle Scripts runs in 16 Sec.

If some one can suggests any other options. So that I tune this job.
Let me know if you need more information,,

Thanks
parag K.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Can you split your data streams so that one does only INSERTs? This would bring the time down as UPSERTs are quite a bit slower. What stage are you using to write to the database (ODBC is slower than others)? How long does the job take if you don't write anything to the database?
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Include enough in-process / inter-process memory.

Play around with the arraySize and commitSize.

Remove the transform if you are not doing anything in it. If you are referring to LinkPartitioner as Transform, then leave it.
parag_pk
Participant
Posts: 12
Joined: Fri Feb 18, 2005 1:41 pm

Post by parag_pk »

Andrw,
If without tranformation I am writing to a file its takes 15 sec.

I tried to split Insert and update. It takes same time 45 to 50 sec.
Its Oracle Stage and Update Query is Tune.

Srini,
Array size is 1000 and In mapping only three transformations are present.
I cant insert more than 1 IPC.
I have some logic in transformations.

Thanks for your suggestions. If you have an other options let me know.

Parag K.
dsxuserrio
Participant
Posts: 82
Joined: Thu Dec 02, 2004 10:27 pm
Location: INDIA

Post by dsxuserrio »

Is 16300 records your test data?? What is your expected live production volume?? You should not come to a conclusion based on your results from such low volume. IS there a critical 20 second window?
dsxuserrio

Kannan.N
Bangalore,INDIA
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

In your ETL process, you may run jobs in parallel to reduce the overall process time.

As dsuserio suggested, can you let us know why you need this performance and what it means to you so to give you better answers.
parag_pk
Participant
Posts: 12
Joined: Fri Feb 18, 2005 1:41 pm

Post by parag_pk »

Right now our jobs are in oracle and Unix .. U can say we are doing kind of benchmarking before starting full development...

Time is very critical because its fin org. Some jobs are daily and some monthly. Its huge TB database and everyday we are loading millions and millions records..

Do I have to set any Administration parameters ?

Thanks for u responses ..

Parag K
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Parag_pk,

when a job runs for under a minute of either CPU or realtime a very significant proportion of that time is spent doing things that are only indirectly related to the data itself. This includes reading internal DataStage tables, preprocessing, forking processes, preparing and opening log files and setting up processing environments. Let us put the overhead in your 45-second run at 10 seconds; this means that your performance numbers include a 20% overhead. This overhead is constant, so if your job were to process double the number of rows it would run 80 seconds, bringing your overhead down to 12%. So what the others and I are saying is that your sample is far too small for either analysis of potential tuning or of expected runtimes.

I think that any DS job that runs for less than a couple of minutes is not worth the time spent tuning it; and unless you are in a real-time environment (in which case a DS job the way you have written it is not the correct approach) it should be for you as well. If you are doing benchmarking for your application development then I would highly recommend letting a job run 10-15 minutes to see the Rows-per-second and to identify tuning opportunities.

All of us have, at one time or other, dealt with "huge" amounts of data. Loading millions of records in minimum time is part of our daily bread-and-butter.

If performance is such a critical aspect, then your site might consider using Px; but with a data sample run of under a minute I think that Px will actually be significantly slower than Server; but it does scale rather well 8) (and since I'm no longer an Asc employee and don't have to put a positive spin on everything that is a real compleiment)
Last edited by ArndW on Sat Feb 26, 2005 12:41 pm, edited 1 time in total.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

To add to what ArndW said, you can also tune, design and implement jobs in DataStage to perform better. It is like the hint, architecture, parallel query options available in Oracle.
parag_pk
Participant
Posts: 12
Joined: Fri Feb 18, 2005 1:41 pm

Post by parag_pk »

Hi All,

Thank you all for giving suggestions. I partially agree with you guys. I will try to take different job for benchmarking..

Still I feel If your Database, Network and Hardware is tune and strong and Compare to other ETL tools. DS should do better job than this, But this is not a forum for discussion.

So, Thanks Again anyway ...

Parag K.
Post Reply