Column Length Definition and Performance
Posted: Tue Nov 08, 2011 4:24 pm
Hello,
We are in the build phase of the large data warehouse project and unfortunately expect a number of model changes. One type of change is the column length being different between sources and the target once source systems start producing data. Since we have not been able to do data profiling, this is one of the challenges we have to live with.
To mitigate changes in each of the jobs, I am leaning towards not defining lengths in any of the jobs (that use datasets or delimiters). The jobs obviously run fine without the length definition but I am a little concerned about performance in production with large volumes.
Is this approach advisable? Does DataStage internally define the max length of each data type for its processing and does that have an impact on performance?
Thanks!
We are in the build phase of the large data warehouse project and unfortunately expect a number of model changes. One type of change is the column length being different between sources and the target once source systems start producing data. Since we have not been able to do data profiling, this is one of the challenges we have to live with.
To mitigate changes in each of the jobs, I am leaning towards not defining lengths in any of the jobs (that use datasets or delimiters). The jobs obviously run fine without the length definition but I am a little concerned about performance in production with large volumes.
Is this approach advisable? Does DataStage internally define the max length of each data type for its processing and does that have an impact on performance?
Thanks!