Database comparison

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dodda
Premium Member
Premium Member
Posts: 244
Joined: Tue May 29, 2007 11:31 am

Database comparison

Post by dodda »

Hello

I have a requirement where we have to compare two tables of same structure from different databases. we have to compare almost 200 tables. The tables we need to compare have the same structure but each table will be of different structure. Do we have create 200 jobs or is there a best approach?
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

elaborate more....

how you want to comapre?
what you want to do after compare?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Compare what - the tables themselves? The content? And as noted, please be more specific what 'compare' means.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

The comparison is more suited for a database compare utility. You could bulk load the data into one database and do you compare using local comparison utilities. Get you DBA's involved and see what they have to say.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
dodda
Premium Member
Premium Member
Posts: 244
Joined: Tue May 29, 2007 11:31 am

Post by dodda »

Hello

i need to compare the two tables row by row and i need to output the difference. i need to capture what is changed between those 2 tables.

Thanks
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

The Change Capture stage comes to mind. What do you need as output? Do you declare one table the 'master' and you need to output what is different on the other database only, or do you need to output both values?

Table 1 (key, name, salary)
1, Brad, 1000000
2, Ray, 1000001
3, George, 123456

Table 2 (key, name, salary)
1, Brad, 2000000
2, Andrew, 1000001
3, George, 123456

Based on these tables, what do you want in your output? Both sides?

In terms of generic processes, at a minimum you can parameterize table names and connection parms. If you do Change Capture, you'll also need to pass in the key(s) to sort/hash on.

All, can you use parameters for the change key columns in the Change Capture stage? I hv never tried that before, but I am guessing you can. If so, then yes, I think you can parameterize the whole thing. You just need an intelligent script that calls the DataStage program. The script will gather all the necessary parms and pass them into the program.

You may need to have multiple DataStage programs to handle varying number of keys. Unfortunately DataStage is not yet able to parameterize the NUMBER of keys, only the names.

Brad.
It is not that I am addicted to coffee, it's just that I need it to survive.
mystuff
Premium Member
Premium Member
Posts: 200
Joined: Wed Apr 11, 2007 2:06 pm

Post by mystuff »

I would go with a script (Unless its a different database and you don't have a dblink)

Code: Select all

select t1.*, t2.* from 
$TableA t1 
left join $TableB t2 on
t1.$PK1=t2.$PK2
where t2.$PK2 is null
you can use full outer join if required.

If Keys vary then

Code: Select all

select t1.*, t2.* from 
$TableA t1 
left join $TableB t2 on
$JoinCondition
where t2.$PK2 is null
Its sufficient to check null for one key
------------------------------------------------------------------------

If you want to use datastage... then how about using CRC to check....
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Wouldn't that script require that the tables be on the same database? The original posting implies that the tables are on separate databases.

If they are on the same database then it is a no brainer. The previous script or this:

Code: Select all

select *
from table1

except 

select * 
from table2
;

select * 
from table2

except

select * 
from table1
;
Query 1 gives records that are different (or missing) from table 2, and the query 2 does the opposite. If you order your output in both queries, then the records are sorted the same in both result sets.

This is great for comparison, but if you need it all on the same record, DataStage may be your best bet.

Brad.
It is not that I am addicted to coffee, it's just that I need it to survive.
mystuff
Premium Member
Premium Member
Posts: 200
Joined: Wed Apr 11, 2007 2:06 pm

Post by mystuff »

You are right, I didnt read that first post properly. But highlighted stating same database or dblink to write the query.

EXCEPT, MINUS statemetns for huge tables takes potentially long time. left outer join works pretty well in that case.

Coming back to the question posted

Do you have a DBLINK (I presume no)? if yes, then sql would be good option?
Post Reply