Page 1 of 1

Column Length Definition and Performance

Posted: Tue Nov 08, 2011 4:24 pm
by sbuk
Hello,

We are in the build phase of the large data warehouse project and unfortunately expect a number of model changes. One type of change is the column length being different between sources and the target once source systems start producing data. Since we have not been able to do data profiling, this is one of the challenges we have to live with.

To mitigate changes in each of the jobs, I am leaning towards not defining lengths in any of the jobs (that use datasets or delimiters). The jobs obviously run fine without the length definition but I am a little concerned about performance in production with large volumes.

Is this approach advisable? Does DataStage internally define the max length of each data type for its processing and does that have an impact on performance?

Thanks!

Posted: Mon Jun 18, 2012 3:11 am
by Karthik.M
Defining columns without length is a bad practice in developing jobs and it affects the job performance.
Columns with undefined lengths take up the maximum length of that particular data-type defined. So the processing time will be long in data warehousing projects where data loaded is huge.
One solution is to use a transformer stage in between for adjusting the lengths without producing any warnings but will slightly affect the performance.

Posted: Mon Jun 18, 2012 5:13 am
by ray.wurlod
It's rather more complex than that. There are some situations where unbounded VarChar is recommended, there are others where unbounded VarChar is not recommended. Rather more information than I want to re-research at the moment, but there have been discussions on DSXchange and in developerWorks on this very topic.