Change Data Capture

kumar_s · Post by **kumar_s** » Tue Jan 23, 2007 4:45 am

In that case Log might not help.
But you would love to use WebSphere Replication Server for you very purpose.

nvalia · Post by **nvalia** » Thu Jan 25, 2007 1:50 pm

has 150 columns with each on an average 25 characters per column...would also include some numeric cols as part of this 150 cols
so around 3500 characters per row..

nvalia · Post by **nvalia** » Thu Jan 25, 2007 1:51 pm

well we are actually to trying to replace the current Sybase Replication process by ETL..that is the chalenge

nvalia · Post by **nvalia** » Fri Jan 26, 2007 3:52 pm

The record size could be like 500-600 chracters..(15 cols)
We have 293 million rows of this size and we need to find the changed data using Change capture?
maybe around 450 GB of data

So there will be 2 datasets with these many rows...We have max 8 nodes at our disposal..

So what kind of approx time will we need for this exercise?

DSguru2B · Post by **DSguru2B** » Fri Jan 26, 2007 4:00 pm

Take a million rows, perform your bench mark. Increase it with factors of 10 untill you read maybe 1/3 of your data. Keep noting the performance change. This will give you a fairly accurate approximation of how much time it will take you for the full run.Its hard to guess the time frame without knowing details about your environment.

ray.wurlod · Post by **ray.wurlod** » Fri Jan 26, 2007 6:23 pm

Forty Two

DSguru2B · Post by **DSguru2B** » Fri Jan 26, 2007 7:33 pm

There you go, you wanted a number, you have a number now by Ray.

vmcburney · Post by **vmcburney** » Sun Jan 28, 2007 5:52 pm

I would be surprised if anything in an ETL tool could match the performance and efficiency of a native Sybase replication tool for the straight replication of data. ETL really comes into its own when you are using the "T" in ETL. If you are doing straight CDC with no transformation/consolidation/cleansing then DataStage and the parallel CDC is a kind of cludgy way to do it. There are a lot of overheads in the ETL development that you wouldn't get in a straight replication tool.

So if you are going from Sybase to a data warehouse I would say Sybase replication combined with DataStage is a good option for keeping data volumes down. DataStage with the CDC stage is a good option for keeping it all in one tool. If you are going straight table copies than Sybase Replication is hard to beat.