Page 1 of 1

Database comparison

Posted: Fri Aug 01, 2008 9:32 am
by dodda
Hello

I have a requirement where we have to compare two tables of same structure from different databases. we have to compare almost 200 tables. The tables we need to compare have the same structure but each table will be of different structure. Do we have create 200 jobs or is there a best approach?

Posted: Fri Aug 01, 2008 9:36 am
by keshav0307
elaborate more....

how you want to comapre?
what you want to do after compare?

Posted: Fri Aug 01, 2008 9:42 am
by chulett
Compare what - the tables themselves? The content? And as noted, please be more specific what 'compare' means.

Posted: Fri Aug 01, 2008 9:45 am
by DSguru2B
The comparison is more suited for a database compare utility. You could bulk load the data into one database and do you compare using local comparison utilities. Get you DBA's involved and see what they have to say.

Posted: Fri Aug 01, 2008 9:50 am
by dodda
Hello

i need to compare the two tables row by row and i need to output the difference. i need to capture what is changed between those 2 tables.

Thanks

Posted: Fri Aug 01, 2008 3:10 pm
by bcarlson
The Change Capture stage comes to mind. What do you need as output? Do you declare one table the 'master' and you need to output what is different on the other database only, or do you need to output both values?

Table 1 (key, name, salary)
1, Brad, 1000000
2, Ray, 1000001
3, George, 123456

Table 2 (key, name, salary)
1, Brad, 2000000
2, Andrew, 1000001
3, George, 123456

Based on these tables, what do you want in your output? Both sides?

In terms of generic processes, at a minimum you can parameterize table names and connection parms. If you do Change Capture, you'll also need to pass in the key(s) to sort/hash on.

All, can you use parameters for the change key columns in the Change Capture stage? I hv never tried that before, but I am guessing you can. If so, then yes, I think you can parameterize the whole thing. You just need an intelligent script that calls the DataStage program. The script will gather all the necessary parms and pass them into the program.

You may need to have multiple DataStage programs to handle varying number of keys. Unfortunately DataStage is not yet able to parameterize the NUMBER of keys, only the names.

Brad.

Posted: Fri Aug 01, 2008 3:37 pm
by mystuff
I would go with a script (Unless its a different database and you don't have a dblink)

Code: Select all

select t1.*, t2.* from 
$TableA t1 
left join $TableB t2 on
t1.$PK1=t2.$PK2
where t2.$PK2 is null
you can use full outer join if required.

If Keys vary then

Code: Select all

select t1.*, t2.* from 
$TableA t1 
left join $TableB t2 on
$JoinCondition
where t2.$PK2 is null
Its sufficient to check null for one key
------------------------------------------------------------------------

If you want to use datastage... then how about using CRC to check....

Posted: Fri Aug 01, 2008 3:47 pm
by bcarlson
Wouldn't that script require that the tables be on the same database? The original posting implies that the tables are on separate databases.

If they are on the same database then it is a no brainer. The previous script or this:

Code: Select all

select *
from table1

except 

select * 
from table2
;

select * 
from table2

except

select * 
from table1
;
Query 1 gives records that are different (or missing) from table 2, and the query 2 does the opposite. If you order your output in both queries, then the records are sorted the same in both result sets.

This is great for comparison, but if you need it all on the same record, DataStage may be your best bet.

Brad.

Posted: Fri Aug 01, 2008 3:55 pm
by mystuff
You are right, I didnt read that first post properly. But highlighted stating same database or dblink to write the query.

EXCEPT, MINUS statemetns for huge tables takes potentially long time. left outer join works pretty well in that case.

Coming back to the question posted

Do you have a DBLINK (I presume no)? if yes, then sql would be good option?