joining large database tables from different databases

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
hiltsmi
Participant
Posts: 20
Joined: Thu Aug 04, 2005 9:03 am

joining large database tables from different databases

Post by hiltsmi »

I have 2 very large tables (million plus rows each). One table is on an Oracle database and the other table is on a Sybase sever.

I need to join these 2 tables together. Obviously I can not do it using SQL since they are not in the same database.

Can someone tell me the best way to join these tables together?

I am using 7.5 server edition.

Thanks.
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: joining large database tables from different databases

Post by ogmios »

Different solutions:
1) If you equi-join on the primary key you could select from one table and then do a database lookup on the other table (you should have one unique row per key you try to lookup)
2) Load one of the tables in a hash file in DataStage and use that as lookup, disadvantage here is duplicate storage and hash files are preferably not used as permanent storage (so clear the table, and loading the hash file everytime e.g.)
3) Load one of the tables to the other database via extra DataStage jobs and then do the join in 1 database. Depending on your specific situation you could load either always the full table, or the incremental updates.

Rule of thumb... if the amount of rows to be joined is small (< 400.000 rows) use 1. Above that 2 or 3, and personally I would then always use 3.
And 1 million rows is not much, I start talking about somewhat big tables from 100 million rows upwards :wink:

Ogmios
In theory there's no difference between theory and practice. In practice there is.
hiltsmi
Participant
Posts: 20
Joined: Thu Aug 04, 2005 9:03 am

Post by hiltsmi »

1. not really sure what your saying in this solution?

2. One table is essentially a customer table and the second table is an account table, so a customer may have more than 1 account record.
There may be more than 1 match. Therefore I don't think the hash file will work.

3. The source systems are 24x7 production systems and loading one table from the one system into the other will not be permitted by the production support people.
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

1. not really sure what your saying in this solution?
select on 1 table via a database stage... connect to transformer, connect database stage to transformer as lookup (the underbroken lines) to select on the other table... same problem as 2, in your case you do not have an exact lookup in the second table. (unless you select in the other way of course... select on accounts and lookup the customers).
2. One table is essentially a customer table and the second table is an account table, so a customer may have more than 1 account record.
There may be more than 1 match. Therefore I don't think the hash file will work.
Selecting on accounts on a hashed version of the customers.
3. The source systems are 24x7 production systems and loading one table from the one system into the other will not be permitted by the production support people.
Been there, done that... one advice, extract all data you need from the source database and create your own staging area. You will get into problems with production people else anyway.
And with Oracle you know enough about it to realize that if you query from an Oracle database which is "in action" you will most of the times lose some rows because of Oracle's transaction management (since you will only see the rows which were in the database at the moment you started your query, whatever else happens afterwards).

Ogmios
In theory there's no difference between theory and practice. In practice there is.
Post Reply