Page 1 of 1

Comparing two tables with same metadata

Posted: Wed Dec 28, 2011 4:37 pm
by ukyrvd
Hi Everyone

We are trying to compare two tables with same structure from different schemas. We have about 300 tables in both schemas. Instead of developing separate jobs, we want to build a single common job with run time column propagation on and loop the table names through the sequencer.

I have built a parallel job,

Oracle connector stage from both the schemas as input - Difference stage - Target table (differences are loaded)

The challenge I am facing here is configuring the difference stage, Difference stage is expecting key column and if create a single job how can I update the key column dynamically in difference stage (as key column varies from table to table)?

Or do I need to use any other stage so that I can reuse the same common job? Please suggest any best possible solution to achieve this.

Thank you all in advance

Posted: Wed Dec 28, 2011 5:10 pm
by ray.wurlod
You're going to need to sort the data (a requirement for the Difference stage), so you will need to name at least the sort key columns.

Posted: Wed Dec 28, 2011 11:41 pm
by ukyrvd
Thanks for your reply Ray.

Is there any other stage that i can use to escape the key column ?

Thanks again

Posted: Thu Dec 29, 2011 4:47 pm
by ray.wurlod
What do you mean by "escape" the key column?

The only non-custom stage of which I'm aware that can use a parameter as an input column name is the Modify stage.

Posted: Sat Dec 31, 2011 8:17 am
by qt_ky
Would it work if you defined a dummy key column with value=1 for every row?

Posted: Sat Dec 31, 2011 3:49 pm
by jwiles
You could use modify stages which simply rename a parameterized key column to a common name:

DIFF_KEY_COLUMN = #MY_KEY_COLUMN#

Then sort on DIFF_KEY_COLUMN and compare

Posted: Mon Jan 02, 2012 3:53 am
by BI-RMA
Use a Diff-operator in a generic-stage.

In addition, You will have to use preceding hash- and tsort-operators to sort data correctly and handle partitioning for multiple nodes.

Use the same (parameterized) key-settings for hash, tsort and diff-operators. Find additional parameterization options for the operators in the Parallel Job Advanced Developer's Guide.

Posted: Tue Jan 03, 2012 11:50 am
by ukyrvd
Thank you all for your suggestions. It worked as you all suggested, in the Difference stage i have parametrized the key column and i am passing the value from a file through a perl script.

Thanks all again

Posted: Thu Jan 05, 2012 2:32 pm
by ukyrvd
Hi All

Sorry for coming back again! The solution was working fine if we have one primary key. But we have some tables with composite key. In that case how can I pass multiple keys to Difference stage ?

Please advice.

Thank you all