Will DBlink make it faster?

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
laiko
Premium Member
Premium Member
Posts: 35
Joined: Sun May 25, 2008 10:55 am

Will DBlink make it faster?

Post by laiko »

Hi,
I have a source table which is in database SourceDB and the target table is in TargetDB. The load runs very slow. Will a dblink help?? Right now, I do not use and dblink since in DataStage, we simply specify the database (DSN), right?

thanks in advance..
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Are you talking about Postgres "dblink"?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Welcome! The use of a dblink would basically be transparent to DataStage and requires adding "@linkname" to the table name in the from clause. At least that's how it works in Oracle, assume others are similar if not the same.

In my experience, a dblink is neither faster nor a silver bullet.
-craig

"You can never have too many knives" -- Logan Nine Fingers
laiko
Premium Member
Premium Member
Posts: 35
Joined: Sun May 25, 2008 10:55 am

Post by laiko »

ArndW wrote:Are you talking about Postgres "dblink"? quote] Nope, i mean Oracle dblink.
chulett wrote:In my experience, a dblink is neither faster nor a silver bullet.
Hi Craig, thanks for welcoming me :)
I am just quite not aware of how DS works. Wihout a dblink, does it get the data from db1 to its local memory and then from there, load it to db2? I think with dblink, it gets transferred very fast since I would not need a transformer, but just one stage which executes 'INSERT into target SELECT from source'. So it's like a db1 to db2 transfer, instead of db1 to DS memory to db2 transfer... Please correct me if I'm wrong and I welcome your inputs :)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

First off, understand that a while a dblink does allow you to 'join' tables in disparate instances, the data from the remote database still needs to be sucked over into the originating database, built there and then joined to your tables. So, while it can be a simple solution to the problem, it's not really a good one. IMHO. Besides, it looks like I read 'join' into your topic when that's not really at play here - it's just that dblinks are meant to facilitate joins between instances.

As to what DataStage Server does, it does something similar. Data from your source is brought from wherever that is to 'the engine' on your ETL server and (after transformation) is pushed to wherever your target lives. That would normally involve three stages, not one - your OCI source to issue the select, a Transformer to do any transformations and the column mapping and finally a target OCI stage, tasked with the insert/update/delete DML. This is all you need.

Now, while you technically can do a 'insert into select from source' using custom sql, the before/after sql tabs and silly games to keep the rest of the stage and compiler happy, it isn't really what I'd call a 'best practice'.

A typical Server design in the case where joins between discrete instances are involved would be to first bring the data from the 'remote' database into DataStage and populate a hashed reference file. Secondly, stream in the data from your other instance and do the join via the reference lookup. Finally, the result is sent back to whichever Oracle instance needs the end results. This can be one job or two, depending on your design philosophy.

Unless your network infrastructure is craptastic and the actual transfer of the records is what is causing your slowdown, you need to look elsewhere for your issue. If you are unsure which 'side' is the issue, break the job in two. First just select and land the data to a flat file. You could even put an @FALSE constraint in the link so that all you see is the raw select speed, but at some point you'll need the landed data for the second part. Part two is load the target table from the flat file. That should make it obvious where the problem lies. I'm guessing it will be the latter.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

chulett wrote:First off, understand that a while a dblink does allow you to 'join' tables in disparate instances, the data from the remote database still needs to be sucked over into the originating database, built there and then joined to your tables. So, while it can be a simple solution to the problem, it's not really a good one. IMHO. Besides, it looks like I read 'join' into your topic when that's not really at play here - it's just that dblinks are meant to facilitate joins between instances.
...
That would normally involve three stages, not one - your OCI source to issue the select, a Transformer to do any transformations and the column mapping and finally a target OCI stage, tasked with the insert/update/delete DML.
I have a question here. Is dblink just a link from a Database (OCI for example) stage to another DB stage? If so, even if you connect directly between two passive stages (here...DB stages) would automatically have a transformer inserted during run time by DS. So, how is it impacting the speed either way? Please help me understand!
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, a dblink is - in this particular case - a specific Oracle object, allowing a query to reference a 'remote' table as if it was local to the current instance. I would think other databases have a similar concept.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply