Database comparison
Moderators: chulett, rschirm, roy
Database comparison
Hello
I have a requirement where we have to compare two tables of same structure from different databases. we have to compare almost 200 tables. The tables we need to compare have the same structure but each table will be of different structure. Do we have create 200 jobs or is there a best approach?
I have a requirement where we have to compare two tables of same structure from different databases. we have to compare almost 200 tables. The tables we need to compare have the same structure but each table will be of different structure. Do we have create 200 jobs or is there a best approach?
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
The comparison is more suited for a database compare utility. You could bulk load the data into one database and do you compare using local comparison utilities. Get you DBA's involved and see what they have to say.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
The Change Capture stage comes to mind. What do you need as output? Do you declare one table the 'master' and you need to output what is different on the other database only, or do you need to output both values?
Table 1 (key, name, salary)
1, Brad, 1000000
2, Ray, 1000001
3, George, 123456
Table 2 (key, name, salary)
1, Brad, 2000000
2, Andrew, 1000001
3, George, 123456
Based on these tables, what do you want in your output? Both sides?
In terms of generic processes, at a minimum you can parameterize table names and connection parms. If you do Change Capture, you'll also need to pass in the key(s) to sort/hash on.
All, can you use parameters for the change key columns in the Change Capture stage? I hv never tried that before, but I am guessing you can. If so, then yes, I think you can parameterize the whole thing. You just need an intelligent script that calls the DataStage program. The script will gather all the necessary parms and pass them into the program.
You may need to have multiple DataStage programs to handle varying number of keys. Unfortunately DataStage is not yet able to parameterize the NUMBER of keys, only the names.
Brad.
Table 1 (key, name, salary)
1, Brad, 1000000
2, Ray, 1000001
3, George, 123456
Table 2 (key, name, salary)
1, Brad, 2000000
2, Andrew, 1000001
3, George, 123456
Based on these tables, what do you want in your output? Both sides?
In terms of generic processes, at a minimum you can parameterize table names and connection parms. If you do Change Capture, you'll also need to pass in the key(s) to sort/hash on.
All, can you use parameters for the change key columns in the Change Capture stage? I hv never tried that before, but I am guessing you can. If so, then yes, I think you can parameterize the whole thing. You just need an intelligent script that calls the DataStage program. The script will gather all the necessary parms and pass them into the program.
You may need to have multiple DataStage programs to handle varying number of keys. Unfortunately DataStage is not yet able to parameterize the NUMBER of keys, only the names.
Brad.
It is not that I am addicted to coffee, it's just that I need it to survive.
I would go with a script (Unless its a different database and you don't have a dblink)
you can use full outer join if required.
If Keys vary then
Its sufficient to check null for one key
------------------------------------------------------------------------
If you want to use datastage... then how about using CRC to check....
Code: Select all
select t1.*, t2.* from
$TableA t1
left join $TableB t2 on
t1.$PK1=t2.$PK2
where t2.$PK2 is null
If Keys vary then
Code: Select all
select t1.*, t2.* from
$TableA t1
left join $TableB t2 on
$JoinCondition
where t2.$PK2 is null
------------------------------------------------------------------------
If you want to use datastage... then how about using CRC to check....
Wouldn't that script require that the tables be on the same database? The original posting implies that the tables are on separate databases.
If they are on the same database then it is a no brainer. The previous script or this:
Query 1 gives records that are different (or missing) from table 2, and the query 2 does the opposite. If you order your output in both queries, then the records are sorted the same in both result sets.
This is great for comparison, but if you need it all on the same record, DataStage may be your best bet.
Brad.
If they are on the same database then it is a no brainer. The previous script or this:
Code: Select all
select *
from table1
except
select *
from table2
;
select *
from table2
except
select *
from table1
;
This is great for comparison, but if you need it all on the same record, DataStage may be your best bet.
Brad.
It is not that I am addicted to coffee, it's just that I need it to survive.
You are right, I didnt read that first post properly. But highlighted stating same database or dblink to write the query.
EXCEPT, MINUS statemetns for huge tables takes potentially long time. left outer join works pretty well in that case.
Coming back to the question posted
Do you have a DBLINK (I presume no)? if yes, then sql would be good option?
EXCEPT, MINUS statemetns for huge tables takes potentially long time. left outer join works pretty well in that case.
Coming back to the question posted
Do you have a DBLINK (I presume no)? if yes, then sql would be good option?