reconciliation of source data and target data

yuiduan · Post by **yuiduan** » Wed Oct 27, 2004 8:23 pm

Hi
We use datastage to load data from DB2 to Oracle incrementally. From time to time, we compare db2 tables with their oracle equevalence to make sure they are identical. My approach is convert a db2 table to a flat file, convert its equivalant oracle table to another falt file, then use some compare tool (e.g. TOAD compare function) to compare two flat files. There are hundreds of tables to be compared. I found the comparing process I used is too manual and painful.
I am thinking of another way: use the db2 and oracle tables as inputs to a Universe stage to create/populate two universe tables, and then use universe sql to compare the two tables, store the comparison reault to a third table. I don't know if the universe database has set operation (such as minus, union etc).
What do you think about the above idea?
If it is dorable, where can I find Universe database syntax documents?
Any other options you can suggest?

Thanks a lot.
Yui

rasi · Post by **rasi** » Thu Oct 28, 2004 12:38 am

I am not sure whether you can do this in Universe or not. But I used datastage to do this reconcillation. I created a simple multi instance job which receives two source files and the first column being key column in both the file and with the same metadata. I use one file as hash lookup and another one as main file and do comparision for all the columns. If it doesn't match then it puts to a reject file. Also if records not found in either of the source can be reported to the reject file.

What you need to do is to feed two source file from your two tables to job and run. The query used in those two tables must have all the columns same in the right order.

I am using this component and finding it very useful for me in doing those sort of reconcillation.

Cheers
Siva

ecclesr · Post by **ecclesr** » Thu Oct 28, 2004 1:39 am

The following Universe example will produce the same equivalent result as Oracle using MINUS by using the NOT EXISTS keywords with a sub query eg to request billing address for all customers who orders have been satisfied

>SELECT BILL.TO FROM CUSTOMERS
SQL+WHERE NOT EXISTS (SELECT * FROM ORDERS
SQL+WHERE CUST.NO = CUSTOMERS.CUST.NO);

The Oracle equivalent using MINUS

SELECT BILL.TO FROM CUSTOMERS
MINUS
(SELECT BILL.TO FROM ORDERS
WHERE CUST.NO = CUSTOMERS.CUST.NO);

Ross

ray.wurlod · Post by **ray.wurlod** » Thu Oct 28, 2004 2:28 am

UniVerse has standard set operations. These are UNION, INTERSECT and DIFFERENCE.

Sorry to disappoint the Oraclers, but MINUS is Oracle only.

There are obviously two difference DIFFERENCE results, but the question ought to return 0 rows if the tables are equivalent.

siddharthkaushik · Post by **siddharthkaushik** » Wed Jan 18, 2006 7:06 am

Can UniVerse evaluate such a query:

SELECT A.BU_KEY,A.PERIOD_KEY,2 as UOM_KEY,CASE WHEN B.SUBS_VAL=0 THEN 0 ELSE (A.REV_VAL/B.SUBS_VAL*1000000) END AS ARPU_VAL FROM (SELECT BU_KEY,PERIOD_KEY,2 as UOM_KEY,SUM(REV_VAL) as REV_VAL FROM EISDWH.REVENUE_FACT WHERE REV_KEY in(167,170,172) AND IS_LATEST='1' AND SCENARIO_KEY=1 group by BU_KEY,PERIOD_KEY,UOM_KEY) A, (SELECT BU_KEY,PERIOD_KEY,2 as UOM_KEY,SUBS_VAL FROM EISDWH.SUBS_FACT WHERE SUBS_KEY=24 AND RANGE_KEY=0 AND PROD_KEY=3 and NETWORK_OPTR_KEY=9 AND IS_LATEST='1' AND SCENARIO_KEY=1) B WHERE A.BU_KEY=B.BU_KEY AND A.PERIOD_KEY=B.PERIOD_KEY;

I have been trying to make this query work for sometime now with few changes, but no success as yet....

I have unloaded two tables (SUBS_FACT and REVENUE_FACT) in respective Hashed files. Then I am trying to evaluate revenue per subscriber based on conditions. The query works in ODBC Stage, but the performance is not good. In order to boost the performance, I am trying this route, but still stuck with query not working in UniVerse.

Thanks,
Siddharth

ray.wurlod · Post by **ray.wurlod** » Wed Jan 18, 2006 3:25 pm

You would use EVAL rather than WHEN construct but otherwise UniVerse SQL can do that kind of query. Indexing the constraining columns would help, as it would in any database.

It would be more efficient in UniVerse to use Select Lists on which you can perform in-memory set operations (UNION, INTERSECTION, DIFFERENCE) using the MERGE.LIST command.

You can not use a Hashed File stage for any kind of lookup but "=". Specifically in this case a Hashed File stage does not support an IN operator in a lookup.

kduke · Post by **kduke** » Wed Jan 18, 2006 8:04 pm

I would use crc32. Do a search on how to use this function. I would think this would be the fastest way to compare all fields.

djm · Post by **djm** » Thu Jan 19, 2006 4:14 am

Yui,

can you just confirm for me that, as per your message description, your DataStage server is a Windows server and not a UNIX server. If it is a Windows server I don't have an alternative to offer but if it is a UNIX server, there is another option that I can suggest.

David

ray.wurlod · Post by **ray.wurlod** » Thu Jan 19, 2006 5:31 am

Suggest away, David. You get MKS Toolkit with DataStage 7.5 on Windows. So you can do UNIX stuff.

djm · Post by **djm** » Thu Jan 19, 2006 3:24 pm

I was trying not to presume about any flavour of UNIX tools on a Windows server since that is not necessarily a standard scenario. But suggest away I will ...

There could be a fairly straight forward UNIX solution on the premise that an export from both Oracle and DB2 of the respective tables can be achieved on the command line (e.g. commands piped to PL/SQL).

The solution outline would be:

Code: Select all

  for each table
    export oracle table to flat file
    export DB2 table to flat file
    compare the files producing two files;
      one that holds data that is in one file but not in the other
      one that holds data in the other file but not in the first

So a pseudo-UNIX ksh script to do this for three tables would be:

Code: Select all

  for tabname in table1 table2 table3
  do
    echo $tabname
    command line to invoke Oracle tool to export table $tabname.* to file1
    command line to invoke DB2 tool to export table $tabname.* to file2
    sort -o $tabname.file1 file1
    sort -o $tabname.file2 file2
    comm -13 $tabname.file1 $tabname.file2 > $tabname.only_in_file2
    comm -23 $tabname.file1 $tabname.file2 > $tabname.only_in_file1
  done

I am ignorant of the command required for the respective tools, so the approach does depends on the existence of such tools. The sort step is included as using an ORDER BY clause on an export may require knowledge of all the columns in a particular table whereas using the UNIX sort command circumvents the need for this. You pay your money and you take your choice.

The UNIX comm command ("man comm") allows a comparison to be made between two files and records common to both, only in one file or only in the other file to be written to standard output.

Hopefully this is enough of an overview for you to consider whether or not it is an option that would work for you.

David

jzparad · Post by **jzparad** » Thu Jan 19, 2006 5:04 pm

Ray,

Should the MKS toolkit be automatically installed with the DataStage server?

I'm running 7.5.1 on Windows and I can't seem to find an MKS directory.

ray.wurlod · Post by **ray.wurlod** » Thu Jan 19, 2006 6:21 pm

Aarrgh, it's DataStage EE. Sorry about that.

Anyway, you can get MKS Toolkit or CYGWIN or any of those UNIX emulators without having to get them with DataStage.

vmcburney · Post by **vmcburney** » Thu Jan 19, 2006 8:45 pm

Maybe you should get a price on a WebSphere Information Integrator license, this will let you write SQL to compare the two tables across database platforms. It also gives you a lot more options for your DataStage jobs such as unstructured data sources and cross database query sources.

kduke · Post by **kduke** » Fri Jan 20, 2006 6:53 am

CRC32

DSXchange

reconciliation of source data and target data

reconciliation of source data and target data

Sub Query execution in UniVerse