Change Data Capture

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

In that case Log might not help.
But you would love to use WebSphere Replication Server for you very purpose.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
nvalia
Premium Member
Premium Member
Posts: 180
Joined: Thu May 26, 2005 6:44 am

Post by nvalia »

has 150 columns with each on an average 25 characters per column...would also include some numeric cols as part of this 150 cols
so around 3500 characters per row..
nvalia
Premium Member
Premium Member
Posts: 180
Joined: Thu May 26, 2005 6:44 am

Post by nvalia »

well we are actually to trying to replace the current Sybase Replication process by ETL..that is the chalenge
nvalia
Premium Member
Premium Member
Posts: 180
Joined: Thu May 26, 2005 6:44 am

Post by nvalia »

The record size could be like 500-600 chracters..(15 cols)
We have 293 million rows of this size and we need to find the changed data using Change capture?
maybe around 450 GB of data

So there will be 2 datasets with these many rows...We have max 8 nodes at our disposal..

So what kind of approx time will we need for this exercise?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Take a million rows, perform your bench mark. Increase it with factors of 10 untill you read maybe 1/3 of your data. Keep noting the performance change. This will give you a fairly accurate approximation of how much time it will take you for the full run.Its hard to guess the time frame without knowing details about your environment.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Forty Two

:lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

There you go, you wanted a number, you have a number now by Ray. :lol:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I would be surprised if anything in an ETL tool could match the performance and efficiency of a native Sybase replication tool for the straight replication of data. ETL really comes into its own when you are using the "T" in ETL. If you are doing straight CDC with no transformation/consolidation/cleansing then DataStage and the parallel CDC is a kind of cludgy way to do it. There are a lot of overheads in the ETL development that you wouldn't get in a straight replication tool.

So if you are going from Sybase to a data warehouse I would say Sybase replication combined with DataStage is a good option for keeping data volumes down. DataStage with the CDC stage is a good option for keeping it all in one tool. If you are going straight table copies than Sybase Replication is hard to beat.
Post Reply