reconciliation of source data and target data
Moderators: chulett, rschirm, roy
reconciliation of source data and target data
Hi
We use datastage to load data from DB2 to Oracle incrementally. From time to time, we compare db2 tables with their oracle equevalence to make sure they are identical. My approach is convert a db2 table to a flat file, convert its equivalant oracle table to another falt file, then use some compare tool (e.g. TOAD compare function) to compare two flat files. There are hundreds of tables to be compared. I found the comparing process I used is too manual and painful.
I am thinking of another way: use the db2 and oracle tables as inputs to a Universe stage to create/populate two universe tables, and then use universe sql to compare the two tables, store the comparison reault to a third table. I don't know if the universe database has set operation (such as minus, union etc).
What do you think about the above idea?
If it is dorable, where can I find Universe database syntax documents?
Any other options you can suggest?
Thanks a lot.
Yui
We use datastage to load data from DB2 to Oracle incrementally. From time to time, we compare db2 tables with their oracle equevalence to make sure they are identical. My approach is convert a db2 table to a flat file, convert its equivalant oracle table to another falt file, then use some compare tool (e.g. TOAD compare function) to compare two flat files. There are hundreds of tables to be compared. I found the comparing process I used is too manual and painful.
I am thinking of another way: use the db2 and oracle tables as inputs to a Universe stage to create/populate two universe tables, and then use universe sql to compare the two tables, store the comparison reault to a third table. I don't know if the universe database has set operation (such as minus, union etc).
What do you think about the above idea?
If it is dorable, where can I find Universe database syntax documents?
Any other options you can suggest?
Thanks a lot.
Yui
I am not sure whether you can do this in Universe or not. But I used datastage to do this reconcillation. I created a simple multi instance job which receives two source files and the first column being key column in both the file and with the same metadata. I use one file as hash lookup and another one as main file and do comparision for all the columns. If it doesn't match then it puts to a reject file. Also if records not found in either of the source can be reported to the reject file.
What you need to do is to feed two source file from your two tables to job and run. The query used in those two tables must have all the columns same in the right order.
I am using this component and finding it very useful for me in doing those sort of reconcillation.
Cheers
Siva
What you need to do is to feed two source file from your two tables to job and run. The query used in those two tables must have all the columns same in the right order.
I am using this component and finding it very useful for me in doing those sort of reconcillation.
Cheers
Siva
The following Universe example will produce the same equivalent result as Oracle using MINUS by using the NOT EXISTS keywords with a sub query eg to request billing address for all customers who orders have been satisfied
>SELECT BILL.TO FROM CUSTOMERS
SQL+WHERE NOT EXISTS (SELECT * FROM ORDERS
SQL+WHERE CUST.NO = CUSTOMERS.CUST.NO);
The Oracle equivalent using MINUS
SELECT BILL.TO FROM CUSTOMERS
MINUS
(SELECT BILL.TO FROM ORDERS
WHERE CUST.NO = CUSTOMERS.CUST.NO);
Ross
>SELECT BILL.TO FROM CUSTOMERS
SQL+WHERE NOT EXISTS (SELECT * FROM ORDERS
SQL+WHERE CUST.NO = CUSTOMERS.CUST.NO);
The Oracle equivalent using MINUS
SELECT BILL.TO FROM CUSTOMERS
MINUS
(SELECT BILL.TO FROM ORDERS
WHERE CUST.NO = CUSTOMERS.CUST.NO);
Ross
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
UniVerse has standard set operations. These are UNION, INTERSECT and DIFFERENCE.
Sorry to disappoint the Oraclers, but MINUS is Oracle only.
There are obviously two difference DIFFERENCE results, but the question ought to return 0 rows if the tables are equivalent.
Sorry to disappoint the Oraclers, but MINUS is Oracle only.
There are obviously two difference DIFFERENCE results, but the question ought to return 0 rows if the tables are equivalent.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 20
- Joined: Thu Nov 27, 2003 3:45 am
- Location: New Delhi, India
Sub Query execution in UniVerse
Can UniVerse evaluate such a query:
SELECT A.BU_KEY,A.PERIOD_KEY,2 as UOM_KEY,CASE WHEN B.SUBS_VAL=0 THEN 0 ELSE (A.REV_VAL/B.SUBS_VAL*1000000) END AS ARPU_VAL FROM (SELECT BU_KEY,PERIOD_KEY,2 as UOM_KEY,SUM(REV_VAL) as REV_VAL FROM EISDWH.REVENUE_FACT WHERE REV_KEY in(167,170,172) AND IS_LATEST='1' AND SCENARIO_KEY=1 group by BU_KEY,PERIOD_KEY,UOM_KEY) A, (SELECT BU_KEY,PERIOD_KEY,2 as UOM_KEY,SUBS_VAL FROM EISDWH.SUBS_FACT WHERE SUBS_KEY=24 AND RANGE_KEY=0 AND PROD_KEY=3 and NETWORK_OPTR_KEY=9 AND IS_LATEST='1' AND SCENARIO_KEY=1) B WHERE A.BU_KEY=B.BU_KEY AND A.PERIOD_KEY=B.PERIOD_KEY;
I have been trying to make this query work for sometime now with few changes, but no success as yet....
I have unloaded two tables (SUBS_FACT and REVENUE_FACT) in respective Hashed files. Then I am trying to evaluate revenue per subscriber based on conditions. The query works in ODBC Stage, but the performance is not good. In order to boost the performance, I am trying this route, but still stuck with query not working in UniVerse.
Thanks,
Siddharth
SELECT A.BU_KEY,A.PERIOD_KEY,2 as UOM_KEY,CASE WHEN B.SUBS_VAL=0 THEN 0 ELSE (A.REV_VAL/B.SUBS_VAL*1000000) END AS ARPU_VAL FROM (SELECT BU_KEY,PERIOD_KEY,2 as UOM_KEY,SUM(REV_VAL) as REV_VAL FROM EISDWH.REVENUE_FACT WHERE REV_KEY in(167,170,172) AND IS_LATEST='1' AND SCENARIO_KEY=1 group by BU_KEY,PERIOD_KEY,UOM_KEY) A, (SELECT BU_KEY,PERIOD_KEY,2 as UOM_KEY,SUBS_VAL FROM EISDWH.SUBS_FACT WHERE SUBS_KEY=24 AND RANGE_KEY=0 AND PROD_KEY=3 and NETWORK_OPTR_KEY=9 AND IS_LATEST='1' AND SCENARIO_KEY=1) B WHERE A.BU_KEY=B.BU_KEY AND A.PERIOD_KEY=B.PERIOD_KEY;
I have been trying to make this query work for sometime now with few changes, but no success as yet....
I have unloaded two tables (SUBS_FACT and REVENUE_FACT) in respective Hashed files. Then I am trying to evaluate revenue per subscriber based on conditions. The query works in ODBC Stage, but the performance is not good. In order to boost the performance, I am trying this route, but still stuck with query not working in UniVerse.
Thanks,
Siddharth
I would more readily fail completely in something that I truley beleive in, rather than to succeed overwhelmingly in something that goes against the ideals I have set for myself.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You would use EVAL rather than WHEN construct but otherwise UniVerse SQL can do that kind of query. Indexing the constraining columns would help, as it would in any database.
It would be more efficient in UniVerse to use Select Lists on which you can perform in-memory set operations (UNION, INTERSECTION, DIFFERENCE) using the MERGE.LIST command.
You can not use a Hashed File stage for any kind of lookup but "=". Specifically in this case a Hashed File stage does not support an IN operator in a lookup.
It would be more efficient in UniVerse to use Select Lists on which you can perform in-memory set operations (UNION, INTERSECTION, DIFFERENCE) using the MERGE.LIST command.
You can not use a Hashed File stage for any kind of lookup but "=". Specifically in this case a Hashed File stage does not support an IN operator in a lookup.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I was trying not to presume about any flavour of UNIX tools on a Windows server since that is not necessarily a standard scenario. But suggest away I will ...
There could be a fairly straight forward UNIX solution on the premise that an export from both Oracle and DB2 of the respective tables can be achieved on the command line (e.g. commands piped to PL/SQL).
The solution outline would be:
So a pseudo-UNIX ksh script to do this for three tables would be:
I am ignorant of the command required for the respective tools, so the approach does depends on the existence of such tools. The sort step is included as using an ORDER BY clause on an export may require knowledge of all the columns in a particular table whereas using the UNIX sort command circumvents the need for this. You pay your money and you take your choice.
The UNIX comm command ("man comm") allows a comparison to be made between two files and records common to both, only in one file or only in the other file to be written to standard output.
Hopefully this is enough of an overview for you to consider whether or not it is an option that would work for you.
David
There could be a fairly straight forward UNIX solution on the premise that an export from both Oracle and DB2 of the respective tables can be achieved on the command line (e.g. commands piped to PL/SQL).
The solution outline would be:
Code: Select all
for each table
export oracle table to flat file
export DB2 table to flat file
compare the files producing two files;
one that holds data that is in one file but not in the other
one that holds data in the other file but not in the first
Code: Select all
for tabname in table1 table2 table3
do
echo $tabname
command line to invoke Oracle tool to export table $tabname.* to file1
command line to invoke DB2 tool to export table $tabname.* to file2
sort -o $tabname.file1 file1
sort -o $tabname.file2 file2
comm -13 $tabname.file1 $tabname.file2 > $tabname.only_in_file2
comm -23 $tabname.file1 $tabname.file2 > $tabname.only_in_file1
done
The UNIX comm command ("man comm") allows a comparison to be made between two files and records common to both, only in one file or only in the other file to be written to standard output.
Hopefully this is enough of an overview for you to consider whether or not it is an option that would work for you.
David
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Aarrgh, it's DataStage EE. Sorry about that.
Anyway, you can get MKS Toolkit or CYGWIN or any of those UNIX emulators without having to get them with DataStage.
Anyway, you can get MKS Toolkit or CYGWIN or any of those UNIX emulators without having to get them with DataStage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Maybe you should get a price on a WebSphere Information Integrator license, this will let you write SQL to compare the two tables across database platforms. It also gives you a lot more options for your DataStage jobs such as unstructured data sources and cross database query sources.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn