Database comparison

dodda · Post by **dodda** » Fri Aug 01, 2008 9:32 am

Hello

I have a requirement where we have to compare two tables of same structure from different databases. we have to compare almost 200 tables. The tables we need to compare have the same structure but each table will be of different structure. Do we have create 200 jobs or is there a best approach?

keshav0307 · Post by **keshav0307** » Fri Aug 01, 2008 9:36 am

elaborate more....

how you want to comapre?
what you want to do after compare?

chulett · Post by **chulett** » Fri Aug 01, 2008 9:42 am

Compare what - the tables themselves? The content? And as noted, please be more specific what 'compare' means.

DSguru2B · Post by **DSguru2B** » Fri Aug 01, 2008 9:45 am

The comparison is more suited for a database compare utility. You could bulk load the data into one database and do you compare using local comparison utilities. Get you DBA's involved and see what they have to say.

dodda · Post by **dodda** » Fri Aug 01, 2008 9:50 am

Hello

i need to compare the two tables row by row and i need to output the difference. i need to capture what is changed between those 2 tables.

Thanks

bcarlson · Post by **bcarlson** » Fri Aug 01, 2008 3:10 pm

The Change Capture stage comes to mind. What do you need as output? Do you declare one table the 'master' and you need to output what is different on the other database only, or do you need to output both values?

Table 1 (key, name, salary)
1, Brad, 1000000
2, Ray, 1000001
3, George, 123456

Table 2 (key, name, salary)
1, Brad, 2000000
2, Andrew, 1000001
3, George, 123456

Based on these tables, what do you want in your output? Both sides?

In terms of generic processes, at a minimum you can parameterize table names and connection parms. If you do Change Capture, you'll also need to pass in the key(s) to sort/hash on.

All, can you use parameters for the change key columns in the Change Capture stage? I hv never tried that before, but I am guessing you can. If so, then yes, I think you can parameterize the whole thing. You just need an intelligent script that calls the DataStage program. The script will gather all the necessary parms and pass them into the program.

You may need to have multiple DataStage programs to handle varying number of keys. Unfortunately DataStage is not yet able to parameterize the NUMBER of keys, only the names.

Brad.

mystuff · Post by **mystuff** » Fri Aug 01, 2008 3:37 pm

I would go with a script (Unless its a different database and you don't have a dblink)

Code: Select all

select t1.*, t2.* from 
$TableA t1 
left join $TableB t2 on
t1.$PK1=t2.$PK2
where t2.$PK2 is null

you can use full outer join if required.

If Keys vary then

Code: Select all

select t1.*, t2.* from 
$TableA t1 
left join $TableB t2 on
$JoinCondition
where t2.$PK2 is null

Its sufficient to check null for one key
------------------------------------------------------------------------

If you want to use datastage... then how about using CRC to check....

bcarlson · Post by **bcarlson** » Fri Aug 01, 2008 3:47 pm

Wouldn't that script require that the tables be on the same database? The original posting implies that the tables are on separate databases.

If they are on the same database then it is a no brainer. The previous script or this:

Code: Select all

select *
from table1

except 

select * 
from table2
;

select * 
from table2

except

select * 
from table1
;

Query 1 gives records that are different (or missing) from table 2, and the query 2 does the opposite. If you order your output in both queries, then the records are sorted the same in both result sets.

This is great for comparison, but if you need it all on the same record, DataStage may be your best bet.

Brad.

mystuff · Post by **mystuff** » Fri Aug 01, 2008 3:55 pm

You are right, I didnt read that first post properly. But highlighted stating same database or dblink to write the query.

EXCEPT, MINUS statemetns for huge tables takes potentially long time. left outer join works pretty well in that case.

Coming back to the question posted

Do you have a DBLINK (I presume no)? if yes, then sql would be good option?