delta between tables
Moderators: chulett, rschirm, roy
delta between tables
hi all,
i am trying to find delta between 2 tables which has nulls.... i gave a big constraints for each column r equal and take the reject.... but all the columns r going into reject ???
any one help me w??? what is my mistake......
i am trying to find delta between 2 tables which has nulls.... i gave a big constraints for each column r equal and take the reject.... but all the columns r going into reject ???
any one help me w??? what is my mistake......
Re: delta between tables
Please explain a bit more !!
Like what is that you are trying to achieve and what is that you are doing now ?
Like what is that you are trying to achieve and what is that you are doing now ?
i have more than 5000000000 records in one table and almost half records in another i have to make a table in which atleast a column is changed or new records, but in my first table not all columns will have value there might be some blanks.... so 1.col1=2.col1 does't work (i think because of blanks in those columns).......
thanks..
thanks..
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
These numbers are perfectly possible in countries like India (where Nag is) or China. I was recently involved in a DW for a new telco who are taking on one million new customers per week for the next eighteen months, and even then will only have 4% market share! Multiply this by the number of phone calls, growing exponentially as more and more customers are taken on, and you're easily into this row count range, and more!raju_chvr wrote:Do u know the number is just a fictional number or is it true. Because it is
5 billion rows. Are u really sure you have that many number of rows in it?
I haven't seen a table so big until now ? or am I reading it wrong.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
One very fast form of delta capture, or changed data capture, is the CRC32 function. On the volume of data you are handling it might be faster as you will be replacing a very large constraint with a very small one. Go to the Ascential developer net web site and have a look at the uploaded CRC32 example provided by Michael Hester.
http://developernet.ascential.com/
It sounds like in one table an empty field has a blank space and on the other table it is a null. You can convert a null to a blank space in a DataStage routine or in a SQL select statement.
If you want to use CRC32 I would recommend writing a job or a database script to dump the tables to a pipe delimited flat file with nulls converted to blanks in the select statement. Then follow the steps in the CRC example where one table becomes a hash file with a primary key and CRC code and the other table is compared to this lookup.
http://developernet.ascential.com/
It sounds like in one table an empty field has a blank space and on the other table it is a null. You can convert a null to a blank space in a DataStage routine or in a SQL select statement.
If you want to use CRC32 I would recommend writing a job or a database script to dump the tables to a pipe delimited flat file with nulls converted to blanks in the select statement. Then follow the steps in the CRC example where one table becomes a hash file with a primary key and CRC code and the other table is compared to this lookup.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
"It sounds like in one table an empty field has a blank space and on the other table it is a null. You can convert a null to a blank space in a DataStage routine or in a SQL select statement."
i did tryed writing routine making that blanks to some unrelated values and later on removing them......... still it did't work.......
and for count, yes exact count is: 553179120
i did tryed writing routine making that blanks to some unrelated values and later on removing them......... still it did't work.......
and for count, yes exact count is: 553179120
CRC32 will only guarantee that two records with different CRC values are different, not that two records with the same CRC are identical. So if you use this for a delta, some records that should be processed will not be.
I found this out the hard way in production, dealing with record sets smaller than the 500M ones described here.
Consider, there are only 2^32 possible unique values. It's kind of like the birthday paradox, where the odds of two people in a group having the same birthday becomes about 50% with a group size of 25.
Now, if your compare can be slightly 'lossy' (ie, a few false matches are OK), then CRC32 is a good way to go.
The actual odds are very small, in my real world case, one or two false matches per week were occuring, causing records to not be processed. But if you are talking about financial transactions, then that's still a no-no.
Carter
I found this out the hard way in production, dealing with record sets smaller than the 500M ones described here.
Consider, there are only 2^32 possible unique values. It's kind of like the birthday paradox, where the odds of two people in a group having the same birthday becomes about 50% with a group size of 25.
Now, if your compare can be slightly 'lossy' (ie, a few false matches are OK), then CRC32 is a good way to go.
The actual odds are very small, in my real world case, one or two false matches per week were occuring, causing records to not be processed. But if you are talking about financial transactions, then that's still a no-no.
Carter