how are updates in datastage performance

kollurianu · Post by **kollurianu** » Mon Apr 04, 2005 8:24 am

how are updates in 7.5 version on an oracle table , for 300 thousand rows,
is it good way to choose datastage or is it better to user oracle plsql ,

we are preferring to use datastage what do you say.

Can any one help with suggestions

poorna_76 · Post by **poorna_76** » Mon Apr 04, 2005 8:47 am

kollurianu wrote:how are updates in 7.5 version on an oracle table , for 300 thousand rows,
is it good way to choose datastage or is it better to user oracle plsql ,

we are preferring to use datastage what do you say.

Can any one help with suggestions

Its better to use DataStage.
DataStage seems to perform better.

Thanks

kcbland · Post by **kcbland** » Mon Apr 04, 2005 9:01 am

There's a huge difference. You must consider how SQL and relational databases work. DataStage has NO SECRET TRICKS. It simply uses DML statements just like everyone else. Read the SQL statement in the stage and you see INSERT/UPDATE syntax.

How you do updates does matter. If your SQL is using wildcard updates (an update statement that does not have the WHERE clause specify the primary key), then performance is unpredictable because of the one row in DataStage causing 0, 1, or many rows to update in the database. The performance depends on the ability of the database to find the rows that require updating.

Next, if you are using the primary key to update existing rows, you are operating at the speed it takes DataStage to deliver a row to the database and the database the time to find the row and update it. You get into all kinds of nasty things, like chained rows where the replacement row is bigger than the row already in the table. This causes the database to split the data into separate pieces and store them in different places in the table. You also have to deal with large partitioned tables having global indexes that require updating. You also have to deal with large partitioned tables and you potentially not supplying the partitioning column so the update is less efficient because you are not doing partition elimination.

There's a lot more to say about updates, but in general, updates are VERY SLOW and you need to seriously understand how a database works with them. There are ways to expedite them, but they are not tool based.

kollurianu · Post by **kollurianu** » Mon Apr 04, 2005 9:15 am

Hi Bland ,

Thank you all for the answers, but Bland you mean to say updates are
slow normally it tool doesnot matter , only it depends on the database.

Can you clarify me.

Thank you

kcbland · Post by **kcbland** » Mon Apr 04, 2005 11:14 am

The database controls what data is allowed to load, not the tool. Please understand this point fully: THERE ARE NO TOOLS THAT EXIST THAT CAN PUT DATA INTO A DATABASE. You either use an ODBC driver, a database supplied interface library (OCI/CLI, etc), or a command line data loader (sqlldr, TMU, isql, dbaccess, etc) supplied by the database maker. Ascential, Ab Initio, Informatica, Sunopsis, MS-DTS, OWB, it doesn't matter, they all use the same methods to load data. The database does all of the work, the tools use the supplied interfaces.

You can have arguments that native OCI/CLI connections are faster than ODBC and probably be correct. This is because ODBC is a generic driver, whereas the OCI/CLI interface is a database maker supplied library tuned for that database.

Why are updates slower than inserts? Because that is the nature of databases. When a database updates a row it traditionally does more work than inserting a row. It's easier to insert than update, and that's why most bulk loaders only support inserting rows (Redbricks TMU being the best exception).

I wholeheartedly recommend visiting http://asktom.oracle.com/ and also suggest you get your hands on Oracle Expert One-On-One by Tom Kyte. An excellent understanding of databases and how they work will make a person a better ETL developer. Using a tool does not automatically make things faster, usually just easier.

kcbland · Post by **kcbland** » Mon Apr 04, 2005 11:19 am

kcbland wrote:Ascential, Ab Initio, Informatica, Sunopsis, MS-DTS, OWB, it doesn't matter, they all use the same methods to load data. The database does all of the work, the tools use the supplied interfaces.

Let me be very clear here. Even PX will partition the data to align with the physical database partitions by querying database catalogs and such. But they do not physically write to the underlying data files in the database. It is still handled by the database. In UDB the coordinator node is bypassed and the data directly written to the physical partitions because UDB makes this information possible, but UDB still manages these activities. This is true for insert only processing, but updating is still managed using DML.

kollurianu · Post by **kollurianu** » Mon Apr 04, 2005 1:53 pm

Thank you all for the answers,

but can any one clear me , what would be the difference between , the oracle bulk loader and normal inserts thru OCI STAGE.

Any answers would be appreciated.

kcbland · Post by **kcbland** » Mon Apr 04, 2005 2:10 pm

kollurianu wrote:what would be the difference between , the oracle bulk loader and normal inserts thru OCI STAGE.

sqlldr CONVENTIONAL method is the exact same as INSERT SQL statements, no difference.

sqlldr DIRECT method bypasses INDEXES, CONSTRAINTS, and TRIGGERS to write directly to the data files. As part of post-load processing, indexes and constraints are updated and re-enabled. Direct path loading is clocked at 100's of times faster, but you have to make sure that your data is clean and absolutely a new row. Direct path is extremely fast, but you have to be extremely careful using it. It was built for data warehousing, because the types of ETL processing you do in data warehousing usually does the checks required to satisfy using this method safely. Failure to do these steps means you cannot take advantage of direct path and must use the slower CONVENTIONAL method. The ETL steps are: insert and update row determination, surrogate key generation within the ETL processing not using database lookups and writes, data quality verification for data types, constraint/foreign key validation.

ray.wurlod · Post by **ray.wurlod** » Mon Apr 04, 2005 2:44 pm

One big difference is that the Oracle bulk loader does not do updates. Only inserts.

kollurianu · Post by **kollurianu** » Tue Apr 05, 2005 7:42 am

I know oracle bulk loader just does inserts , what i want to know is there
any performance issue when we insert data thru normal OCI AND
thru bulk loader if so why?

Thank you very much for all ur answers.

diamondabhi · Post by **diamondabhi** » Wed Apr 06, 2005 9:52 am

Hi,
Bulk loader loads into onlyone table so it is comparitively fast, so wen u have to do bulk loading and with no updates its better to use bulk loader.

Thanks.

kollurianu · Post by **kollurianu** » Wed Apr 06, 2005 10:52 am

Hi abhi ,

I didnot get ur point , u saybulk loader is faster as it loads to one table,

what does that mean?

And also can any one please shed me some light , is bulk loader better

sqlloader better, which one good performance one over the other , if
so in what each one i s good for?

Thank you

roy · Post by **roy** » Sun Apr 10, 2005 1:45 am

Hi,
IMHO, Ken's postings here are FAQ worthy.
Regarding the bulk loaders I think a search will find previous topic on that and conclude that using the command line utility supplied with the DB client software is prefered over using the bulk loader stage (the stage is normally used to generate the data and control files for the command line bulk loader, though it has loading capabilities)

IHTH,

DSXchange

how are updates in datastage performance

how are updates in datastage performance

Re: how are updates in datastage performance