Column Length Definition and Performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sbuk
Participant
Posts: 4
Joined: Fri Mar 27, 2009 1:22 pm

Column Length Definition and Performance

Post by sbuk »

Hello,

We are in the build phase of the large data warehouse project and unfortunately expect a number of model changes. One type of change is the column length being different between sources and the target once source systems start producing data. Since we have not been able to do data profiling, this is one of the challenges we have to live with.

To mitigate changes in each of the jobs, I am leaning towards not defining lengths in any of the jobs (that use datasets or delimiters). The jobs obviously run fine without the length definition but I am a little concerned about performance in production with large volumes.

Is this approach advisable? Does DataStage internally define the max length of each data type for its processing and does that have an impact on performance?

Thanks!
Karthik.M
Participant
Posts: 24
Joined: Thu Apr 26, 2012 1:18 am
Location: Chennai,TamilNadu

Post by Karthik.M »

Defining columns without length is a bad practice in developing jobs and it affects the job performance.
Columns with undefined lengths take up the maximum length of that particular data-type defined. So the processing time will be long in data warehousing projects where data loaded is huge.
One solution is to use a transformer stage in between for adjusting the lengths without producing any warnings but will slightly affect the performance.
Karthik.M
Analyst
Infotrellis.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's rather more complex than that. There are some situations where unbounded VarChar is recommended, there are others where unbounded VarChar is not recommended. Rather more information than I want to re-research at the moment, but there have been discussions on DSXchange and in developerWorks on this very topic.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply