Page 1 of 1

Posted: Wed Dec 11, 2002 12:02 am
by ray.wurlod
It's very hard to provide generic answers without some knowledge about what you are trying to do. One thing to contemplate is to use Quality Manager to perform an initial audit of data quality so that, even though you can't actually change the database, you can at least get a "scientific" measure of how bad the data quality is.
That said, for the rest the answer is to do things as efficiently as possible. Make as much use as possible of in-line expressions (in Transformer stages, or in Transforms), and use optimally efficient coding techniques when you are forced to create Routines. In the main, this is not doing anything you don't have to do (such as extraneous file opens), using more efficient rather than less efficient BASIC statements, and keeping as much as possible in memory for as long as possible (see COMMON in the BASIC manual for example).
Let me pre-empt your next question. There is no published list of "more efficient rather than less efficient BASIC statements", mainly because what is most efficient will depend to some extent on the context in which it is used.

Posted: Thu Dec 12, 2002 4:20 pm
by vmcburney
Have you looked at the Ascential Integrity product? While Quality Manager is good at locating and reporting on data quality problems the Integrity tool can be used to locate and clean problems.

Since you are using text files as a source you will also benefit from the Integrity products ability to process text fields such as addresses and phone numbers.

If you process your files sequentially you could save time by running an Integrity cleanse in parallel with DataStage processing. Eg. cleanse the second file while the first file is being loaded.

Posted: Mon Dec 30, 2002 3:48 am
by WoMaWil
A way to keep your routines and to make the process a bit faster is to write a special job or part of a job where you have flat file as input and flat file as output.

And don't forget to reengeneer you routines the way Ray Wurlod sugests.