Best Method to Join the Tables

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
samdsx
Premium Member
Premium Member
Posts: 19
Joined: Wed Aug 18, 2010 8:48 pm

Re: Best Method of Joing Tables

Post by samdsx »

Please provide record counts (each table and after join) and size of data extracting.
madhusds
Premium Member
Premium Member
Posts: 29
Joined: Tue Dec 28, 2010 8:19 pm
Location: NH
Contact:

Re: Best Method of Joing Tables

Post by madhusds »

Each Table has nearly 80000 to 100000 records..
Thanks
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

No one here would be able to answer your question without knowing whats under the hood for Datastage server and Database server, are they located on the same machine or remotely located[as appears from your data] ...... etc....
There are n number of points to be thought of before saying which approach is good
- Zulfi
madhusds
Premium Member
Premium Member
Posts: 29
Joined: Tue Dec 28, 2010 8:19 pm
Location: NH
Contact:

Post by madhusds »

Apart from the above person is there any one who can answer to my question with information I provided ..

Note: all the tables resides on the same database
Any way thanks for answering that you can't answer to my question..
Thanks
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

Then why not try all the 3 approaches yourself and post the best one?
It will be useful for everyone.
pandeeswaran
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

please be patient!!
pandeeswaran
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

Are the key columns indexed ?
How many nodes the DataStage server is running on?
How many nodes the Database server is running on?
Are there any filter conditions over the selected data?
If your job is supposed to be running in Production, then what are the loads on the database and datastage servers at the time the job runs?

A little humbleness is every ones policy when asking/sharing information on free forums
- Zulfi
madhusds
Premium Member
Premium Member
Posts: 29
Joined: Tue Dec 28, 2010 8:19 pm
Location: NH
Contact:

Post by madhusds »

zulfi123786 wrote:
Are the key columns indexed ?
YES
How many nodes the DataStage server is running on?
4
How many nodes the Database server is running on?

Are there any filter conditions over the selected data?
NO
If your job is supposed to be running in Production, then what are the loads on the database and datastage servers at the time the job runs?
It is still in DEV
A little humbleness is every ones policy when asking/sharing information on free forums
Yes you are correct: I agree

"You have a bit of thinking to do, [sarcasm]I hope that is a more appropriate answer [/sarcasm] -- kwwilliams "
Thanks
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Performance questions are often the most difficult to answer without having most or even all the details of your environment. If I were to ask such a question here, I know I'd be unable to share some of those details due to my employer's security policies. So, there will be some frustration with the answers you might get. I don't think we can avoid that.

DBMS usually have the best performance for complex queries, joins and unions. That's because the most efficient way to select data is using indexes on the tables. The most efficient way to create joins is under the database architecture, because the designer -- if he or she knows that such joins will be used -- can design table relationships for those joins (parent-child, foreign key, etc.)

I know DB2 well, Oracle almost as well, and almost nothing about the rest including SQL Server, but I do know that the priniciples I describe above are true for them as they are for DB2 and Oracle. Your best partner to answer performance issues is your DBA and database architect. That's the best first answer you can get here or anywhere.

As a starting point, one question: Will you be using all the rows from each table, or subsets of the rows? Sometimes, if you need all the rows, a bulk unload is a good first step. You can sometimes scan a file a bit faster than you can run a DBMS query on it.

I don't think I've described anything beyond the three methods you listed. I can't think of a general method to suggest beyond them.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

As a general rule, if the join is totally supported by indexes in the database then performing the join in the database will be more efficient than in DataStage because it will be performed in the indexes. Beyond that the factors mentioned in others' posts will come into play.

Another general rule is that, if the join will eliminate the majority of rows (say over 70%) it's usually better to do that in the database too, so that DataStage gets to process far fewer rows than would otherwise be the case.

Of course, joining tables in heterogeneous databases introduces a whole new set of considerations, particularly around how efficiently linkages between them can be made in the databases themselves, whether or not some kind of Federation engine is involved, etc.

If you have Balanced Optimization installed, DataStage can make a cost-based decision about whether to work in the database or in the job design.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
madhusds
Premium Member
Premium Member
Posts: 29
Joined: Tue Dec 28, 2010 8:19 pm
Location: NH
Contact:

Post by madhusds »

Thanks every one..
Thanks
Post Reply