delta between tables

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
nag0143
Premium Member
Premium Member
Posts: 159
Joined: Fri Nov 14, 2003 1:05 am

delta between tables

Post by nag0143 »

hi all,

i am trying to find delta between 2 tables which has nulls.... i gave a big constraints for each column r equal and take the reject.... but all the columns r going into reject ???

any one help me w??? what is my mistake......
raju_chvr
Premium Member
Premium Member
Posts: 165
Joined: Sat Sep 27, 2003 9:19 am
Location: USA

Re: delta between tables

Post by raju_chvr »

Please explain a bit more !!

Like what is that you are trying to achieve and what is that you are doing now ?
nag0143
Premium Member
Premium Member
Posts: 159
Joined: Fri Nov 14, 2003 1:05 am

Post by nag0143 »

i have more than 5000000000 records in one table and almost half records in another i have to make a table in which atleast a column is changed or new records, but in my first table not all columns will have value there might be some blanks.... so 1.col1=2.col1 does't work (i think because of blanks in those columns).......


thanks..
raju_chvr
Premium Member
Premium Member
Posts: 165
Joined: Sat Sep 27, 2003 9:19 am
Location: USA

Post by raju_chvr »

Do u know the number is just a fictional number or is it true. Because it is

5 billion rows. Are u really sure you have that many number of rows in it?

I haven't seen a table so big until now ? or am I reading it wrong.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

How can you do a delta when 1.col1 <> 2.col1 is not part of the delta? What are measuring? Post your SQL.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

raju_chvr wrote:Do u know the number is just a fictional number or is it true. Because it is

5 billion rows. Are u really sure you have that many number of rows in it?

I haven't seen a table so big until now ? or am I reading it wrong.
These numbers are perfectly possible in countries like India (where Nag is) or China. I was recently involved in a DW for a new telco who are taking on one million new customers per week for the next eighteen months, and even then will only have 4% market share! Multiply this by the number of phone calls, growing exponentially as more and more customers are taken on, and you're easily into this row count range, and more!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

One very fast form of delta capture, or changed data capture, is the CRC32 function. On the volume of data you are handling it might be faster as you will be replacing a very large constraint with a very small one. Go to the Ascential developer net web site and have a look at the uploaded CRC32 example provided by Michael Hester.

http://developernet.ascential.com/

It sounds like in one table an empty field has a blank space and on the other table it is a null. You can convert a null to a blank space in a DataStage routine or in a SQL select statement.

If you want to use CRC32 I would recommend writing a job or a database script to dump the tables to a pipe delimited flat file with nulls converted to blanks in the select statement. Then follow the steps in the CRC example where one table becomes a hash file with a primary key and CRC code and the other table is compared to this lookup.
nag0143
Premium Member
Premium Member
Posts: 159
Joined: Fri Nov 14, 2003 1:05 am

Post by nag0143 »

"It sounds like in one table an empty field has a blank space and on the other table it is a null. You can convert a null to a blank space in a DataStage routine or in a SQL select statement."
i did tryed writing routine making that blanks to some unrelated values and later on removing them......... still it did't work.......

and for count, yes exact count is: 553179120
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I think you need to change nulls into blanks that way a blank will equal a blank. What is the database? You could use NVL() then CRC32 would work. Do a search on this site for CRC.
Mamu Kim
clshore
Charter Member
Charter Member
Posts: 115
Joined: Tue Oct 21, 2003 11:45 am

Post by clshore »

CRC32 will only guarantee that two records with different CRC values are different, not that two records with the same CRC are identical. So if you use this for a delta, some records that should be processed will not be.
I found this out the hard way in production, dealing with record sets smaller than the 500M ones described here.
Consider, there are only 2^32 possible unique values. It's kind of like the birthday paradox, where the odds of two people in a group having the same birthday becomes about 50% with a group size of 25.
Now, if your compare can be slightly 'lossy' (ie, a few false matches are OK), then CRC32 is a good way to go.
The actual odds are very small, in my real world case, one or two false matches per week were occuring, causing records to not be processed. But if you are talking about financial transactions, then that's still a no-no.

Carter
Post Reply