OCI Vs Dynamic RDBMS

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
shrey3a
Premium Member
Premium Member
Posts: 234
Joined: Sun Nov 21, 2004 10:41 pm

OCI Vs Dynamic RDBMS

Post by shrey3a »

Hi Gurus,

I'm in process of modifying some Datastage jobs for performance issues.

In these jobs Dynamic RDBMS stage is been used to make a connection to Oracle 9i database. I was wondering will it make difference to replace them with oracle9i(oci) stages i.e. will it improve the performance of jobs.

Thanks in advance

Regards,
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

No, look for your improvements in the design.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why did Ken say no? Because, irrespective of the stage type, what happens is that SQL statements are sent to Oracle and the results of those are retrieved. It is (or should be) the same SQL in either case.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Since Halloween took up most of my day, I didn't further Ray's comments.

Changing out the RDBMS for OCI is worthless, as I believe that they both use sqlnet anyway, so there's no performance difference. I could be wrong, but even if someone said ODBC for OCI for performance reasons I'd say whoopee.

The problem is, if your job design is ODBC-->XFM-->ODBC and you think switching ODBC to OCI is going to make your job 2X faster, think again. RDBMS to OCI isn't going to do anything different. You need to be using every processor cycle you can in order to move data quicker.

A re-thinking of the design would be something like:

ODBC --> SEQ (run multiple instances of this with a ranging query to fast unload the source data)

ODBC --> HASH (run multiple instances of this with a ranging query to fast unload the target data that may be updated)

SEQ --> XFM w/HASH reference --> SEQ (updates)
--> SEQ (inserts)
(multi-instance this job to simultaneously XFM more data in less time)

SEQ --> bulkload to target (inserts)
SEQ --> OCI (update only SQL action)

Not only will I get the data out of the source database faster, I reduce the chance of holding a query snapshot too long, use too much rollback segment, etc. I'll transformer faster using more CPUS, and by separating inserts from updates I insure that final loading uses the fastest database methods possible. Also, the entire process has restartable segments with milestones in the process.

These types of structured design techniques yield multiples of improvements.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

kcbland wrote:These types of structured design techniques yield multiples of improvements.
And run counter to a 'Best Practices' document I was just sent from IBM. I was flumoxed to read that landing data in Server jobs was frowned upon and specifically mentioned as a 'Thou Shall Not'. :roll:
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Yeah, they've got to mindwipe everyone into a PX frame of mind. You'll hear the same argument from Oracle about writing queries and using work tables.

Here's a document that's really illuminating. It shows how landing of some data in Oracle facilitates getting your end results faster.

http://oraclesvca2.oracle.com/openworld ... W12681.doc

I worked with these guys on this project.

As for not landing, why the hell do you have hash files then? How do you effectively build restartable milestones in long running processes? How do you NOT pull all of the target tables into Merge/Lookup stages if you don't first scan your source data and use that information to shrink the targets/references into only-necessary datasets?

Real client example. Every load cycle they processed an average 20K rows of data. The target table held 100M rows. To decide if insert or update, they merged all 100M rows against the 20K. Their 128cpus were hammered during this PX job activity.

So, I had them take the 20K rows and scan it for keys and load those keys into a global work table in the DB. Then, we inner-joined the 20K rows in the work table with the 100M rows and thru that to the hash file. Now the job just had to deal with 20K referencing with <20K rows in a hash file. Any thoughts which took less time to run? Our smart Server solution on a single 4 cpu node took less time than the PX solution on 128 cpus slammed. Now, we could rewrite the Server as a pure PX solution and be even faster. But for this customer, one pilot example of pure Server versus PX had the "smart" Server design accomplish in 28 minutes what the PX solution required 120 minutes, and that was 4 cpu node versus 16 nodes totalling 128 cpus.

The problem IMO, and I know they're reading this, is that they are deathly afraid of anyone espousing that it takes intelligent design to do ETL. That is counter to the sales pitch. You might even think that by not using the Server tool effectively, it contributes to the "you need to upgrade to PX for performance reasons" argument, IMO.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
PhilipDuPlessis
Participant
Posts: 23
Joined: Thu Oct 27, 2005 12:51 am

Post by PhilipDuPlessis »

Ken,

Dude, I am really new to DSExchange and datastage in general so plze be patient... Referring back to where you mentioned that performance would be roughly the same using an Oracle ODBC stage, would the same be said in the case of SQL (would it be faster to use on a SQL ODBC stage than a generic ODBC stage? if so, what would the performance difference be???

On the subject of the work tables, once again, it provides you the opportunity of indexing these making the updates in particular run extremely faster, as we discovered during one of our multiple million row jobs...
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Oracle OCI, not Oracle ODBC. ODBC is Bill Gates revenge on the Unix world.

Oracle, UDB, Informix, and Sybase all have their own connectivity software for their databases. OCI, CLI, whatever the moniker. ODBC is a PC thing foisted on the Unix world.

In theory, database vendors know their databases better and can provide faster interfaces. But, they're not 2X faster, only like 20-30% faster. They have more features specific to their database than the ODBC standards allow, so in some cases they are faster because of a particular feature (tuning/connection options, TRUNCATE vs. deletes, etc).

I believe the MS-SQL stage, which is new, uses the same ODBC connection defined on the server. Oracle OCI versus ODBC is a different configuration. Oracle OCI requires you to install Oracle client software (highly preferred anyway) on your PC, whereas ODBC requires you to setup a system DSN under Control Panel.

As for the work tables, I'll paraphrase myself again. The best tool can be defeated by the worst user. The best user can make the worst tool shine. I can't beat Michael Jordan by putting him in highheels and myself in Nikes.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
PhilipDuPlessis
Participant
Posts: 23
Joined: Thu Oct 27, 2005 12:51 am

Post by PhilipDuPlessis »

thanks Ken. Maynnn, this is gonna be a bumpy ride with loads of learnings...
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The data integration business is not kiddie play. There's no magic fairy dust you can sprinkle on people to make them move billions of bytes of data better. The ETL Genie does not grant wishes for moving data faster just by checking a box or tweaking an array size.

Someone who attempts to load 2 billion rows of data in one pass needs to learn the lesson that's coming. Any company who puts a SQL-ignorant person in front of an ETL tool client and says "move my data" deserves the quality of result coming. The Unix goblin has to be defeated, you can't just think the ETL tool elevates you away from the operating system, you can't be "OS un-aware".
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply