Page 1 of 1

Re: Best Method of Joing Tables

Posted: Fri Dec 16, 2011 9:43 am
by samdsx
Please provide record counts (each table and after join) and size of data extracting.

Re: Best Method of Joing Tables

Posted: Fri Dec 16, 2011 10:47 am
by madhusds
Each Table has nearly 80000 to 100000 records..

Posted: Fri Dec 16, 2011 11:43 am
by zulfi123786
No one here would be able to answer your question without knowing whats under the hood for Datastage server and Database server, are they located on the same machine or remotely located[as appears from your data] ...... etc....
There are n number of points to be thought of before saying which approach is good

Posted: Fri Dec 16, 2011 12:11 pm
by madhusds
Apart from the above person is there any one who can answer to my question with information I provided ..

Note: all the tables resides on the same database
Any way thanks for answering that you can't answer to my question..

Posted: Fri Dec 16, 2011 12:23 pm
by pandeesh
Then why not try all the 3 approaches yourself and post the best one?
It will be useful for everyone.

Posted: Fri Dec 16, 2011 12:43 pm
by pandeesh
please be patient!!

Posted: Fri Dec 16, 2011 12:51 pm
by zulfi123786
Are the key columns indexed ?
How many nodes the DataStage server is running on?
How many nodes the Database server is running on?
Are there any filter conditions over the selected data?
If your job is supposed to be running in Production, then what are the loads on the database and datastage servers at the time the job runs?

A little humbleness is every ones policy when asking/sharing information on free forums

Posted: Mon Dec 19, 2011 12:52 pm
by madhusds
zulfi123786 wrote:
Are the key columns indexed ?
YES
How many nodes the DataStage server is running on?
4
How many nodes the Database server is running on?

Are there any filter conditions over the selected data?
NO
If your job is supposed to be running in Production, then what are the loads on the database and datastage servers at the time the job runs?
It is still in DEV
A little humbleness is every ones policy when asking/sharing information on free forums
Yes you are correct: I agree

"You have a bit of thinking to do, [sarcasm]I hope that is a more appropriate answer [/sarcasm] -- kwwilliams "

Posted: Mon Dec 19, 2011 2:56 pm
by FranklinE
Performance questions are often the most difficult to answer without having most or even all the details of your environment. If I were to ask such a question here, I know I'd be unable to share some of those details due to my employer's security policies. So, there will be some frustration with the answers you might get. I don't think we can avoid that.

DBMS usually have the best performance for complex queries, joins and unions. That's because the most efficient way to select data is using indexes on the tables. The most efficient way to create joins is under the database architecture, because the designer -- if he or she knows that such joins will be used -- can design table relationships for those joins (parent-child, foreign key, etc.)

I know DB2 well, Oracle almost as well, and almost nothing about the rest including SQL Server, but I do know that the priniciples I describe above are true for them as they are for DB2 and Oracle. Your best partner to answer performance issues is your DBA and database architect. That's the best first answer you can get here or anywhere.

As a starting point, one question: Will you be using all the rows from each table, or subsets of the rows? Sometimes, if you need all the rows, a bulk unload is a good first step. You can sometimes scan a file a bit faster than you can run a DBMS query on it.

I don't think I've described anything beyond the three methods you listed. I can't think of a general method to suggest beyond them.

Posted: Mon Dec 19, 2011 3:10 pm
by ray.wurlod
As a general rule, if the join is totally supported by indexes in the database then performing the join in the database will be more efficient than in DataStage because it will be performed in the indexes. Beyond that the factors mentioned in others' posts will come into play.

Another general rule is that, if the join will eliminate the majority of rows (say over 70%) it's usually better to do that in the database too, so that DataStage gets to process far fewer rows than would otherwise be the case.

Of course, joining tables in heterogeneous databases introduces a whole new set of considerations, particularly around how efficiently linkages between them can be made in the databases themselves, whether or not some kind of Federation engine is involved, etc.

If you have Balanced Optimization installed, DataStage can make a cost-based decision about whether to work in the database or in the job design.

Posted: Wed Dec 21, 2011 9:19 am
by madhusds
Thanks every one..