delta between tables

nag0143 · Post by **nag0143** » Thu Jan 29, 2004 2:34 pm

hi all,

i am trying to find delta between 2 tables which has nulls.... i gave a big constraints for each column r equal and take the reject.... but all the columns r going into reject ???

any one help me w??? what is my mistake......

raju_chvr · Post by **raju_chvr** » Thu Jan 29, 2004 2:37 pm

Please explain a bit more !!

Like what is that you are trying to achieve and what is that you are doing now ?

nag0143 · Post by **nag0143** » Thu Jan 29, 2004 2:55 pm

i have more than 5000000000 records in one table and almost half records in another i have to make a table in which atleast a column is changed or new records, but in my first table not all columns will have value there might be some blanks.... so 1.col1=2.col1 does't work (i think because of blanks in those columns).......

thanks..

raju_chvr · Post by **raju_chvr** » Thu Jan 29, 2004 3:36 pm

Do u know the number is just a fictional number or is it true. Because it is

5 billion rows. Are u really sure you have that many number of rows in it?

I haven't seen a table so big until now ? or am I reading it wrong.

kduke · Post by **kduke** » Thu Jan 29, 2004 3:45 pm

How can you do a delta when 1.col1 <> 2.col1 is not part of the delta? What are measuring? Post your SQL.

ray.wurlod · Post by **ray.wurlod** » Thu Jan 29, 2004 4:00 pm

raju_chvr wrote:Do u know the number is just a fictional number or is it true. Because it is

5 billion rows. Are u really sure you have that many number of rows in it?

I haven't seen a table so big until now ? or am I reading it wrong.

These numbers are perfectly possible in countries like India (where Nag is) or China. I was recently involved in a DW for a new telco who are taking on one million new customers per week for the next eighteen months, and even then will only have 4% market share! Multiply this by the number of phone calls, growing exponentially as more and more customers are taken on, and you're easily into this row count range, and more!

vmcburney · Post by **vmcburney** » Thu Jan 29, 2004 5:42 pm

One very fast form of delta capture, or changed data capture, is the CRC32 function. On the volume of data you are handling it might be faster as you will be replacing a very large constraint with a very small one. Go to the Ascential developer net web site and have a look at the uploaded CRC32 example provided by Michael Hester.

http://developernet.ascential.com/

It sounds like in one table an empty field has a blank space and on the other table it is a null. You can convert a null to a blank space in a DataStage routine or in a SQL select statement.

If you want to use CRC32 I would recommend writing a job or a database script to dump the tables to a pipe delimited flat file with nulls converted to blanks in the select statement. Then follow the steps in the CRC example where one table becomes a hash file with a primary key and CRC code and the other table is compared to this lookup.

nag0143 · Post by **nag0143** » Thu Jan 29, 2004 6:17 pm

"It sounds like in one table an empty field has a blank space and on the other table it is a null. You can convert a null to a blank space in a DataStage routine or in a SQL select statement."
i did tryed writing routine making that blanks to some unrelated values and later on removing them......... still it did't work.......

and for count, yes exact count is: 553179120

kduke · Post by **kduke** » Thu Jan 29, 2004 6:53 pm

I think you need to change nulls into blanks that way a blank will equal a blank. What is the database? You could use NVL() then CRC32 would work. Do a search on this site for CRC.

clshore · Post by **clshore** » Fri Jan 30, 2004 9:13 am

CRC32 will only guarantee that two records with different CRC values are different, not that two records with the same CRC are identical. So if you use this for a delta, some records that should be processed will not be.
I found this out the hard way in production, dealing with record sets smaller than the 500M ones described here.
Consider, there are only 2^32 possible unique values. It's kind of like the birthday paradox, where the odds of two people in a group having the same birthday becomes about 50% with a group size of 25.
Now, if your compare can be slightly 'lossy' (ie, a few false matches are OK), then CRC32 is a good way to go.
The actual odds are very small, in my real world case, one or two false matches per week were occuring, causing records to not be processed. But if you are talking about financial transactions, then that's still a no-no.

Carter

DSXchange

delta between tables

delta between tables

Re: delta between tables