time calculations

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
von
Participant
Posts: 10
Joined: Fri Jun 04, 2004 12:02 pm

time calculations

Post by von »

Hi,

Can any body say what would be the approximate time that would take to process 10 million records in a server job which has the following sequence of stages

source>sort>transformer>target

is there any formula to calculate the processing time.


ThanX
Von
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

NO.

We don't know what 'source' is: database or file. If database, what kind of query? How many columns of data --> 1000 columns vs 10 columns = big difference in runtime.

We don't know what 'sort' does: is it sorting on one field, two, three, complex sort logic? Any de-duplication?

We don't know what 'transformer' does: Is there a lot of difficult or conditional derivation? Are you using custom functions or just pass-thru mapping? What about complex derivations using If-Then-Else logic?

We don't know what 'target' is: database or file. If database, what kind of load action: truncate, delete, insert only, update only, insert or update, update or insert? What kind of server is this, what kind of database? What is the nature of the target table, does it have indexes, triggers, and referential integrity/foreign key constraints.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ketfos
Participant
Posts: 562
Joined: Mon May 03, 2004 8:58 pm
Location: san francisco
Contact:

Post by ketfos »

Lets assume
1. Its flat file on UNIX having 10 fixed width columns with Oracle table as target on Unix.
2. It sorts on one field
3. there are no conditional derivations in transformer or custom fucntions.
4. It truncates before loading.
5. there is no Referential Integrity involved.

Can we still have some time value?

Ketfos
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Would that be ten Char(1) columns, or ten Char(10000) columns?

How fast can your disks/network deliver data?
Are you partitioning the data in any way to take advantage of parallel processing?
How fast and how many are your CPUs?
How much memory is available?
Is the weather fine or raining? (Only joking.)
What are the settings for the configuration parameters for DataStage (in uvconfig)?
Are you using the standard Sort stage, the CoSort stage, or one of your own?
What stage type is the target using - SeqFile (invoking sqlldr), ORABULK, Oracle OCI, or one of your own?

These and many other variables will affect the result. Unless you're comparing apples with apples (that is, the person answering your question has idential hardware, configuration and data), any answer is moot.

What you CAN do is try it out.
For example, to find out how fast data can be delivered to DataStage, create a job that reads the file into a Transformer and outputs no rows (use a constraint of @FALSE). You'll probably be positively surprised. The bad news is that any processing you add will - necessarily - slow things down a bit.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
von
Participant
Posts: 10
Joined: Fri Jun 04, 2004 12:02 pm

Post by von »

Thanks guys


I am looking if some one can give their earlier experiences, i had the following scenario


i had a job which used to process around 100,000 records in 1&1/2 hour on one node which used to process around 100 columns from a SQL server (using ODBC) connection to a Sequential file.

so if any one can give their experinces probably i can summarize with my present scenario

ThanX
Von
Post Reply