Page 1 of 1

Implementation of QualityStage

Posted: Wed May 25, 2005 7:39 am
by MukundShastri
Hi,

We have not used quality stage . Hence following basic question about the same.
We want to use do data cleansing using difference business rules of around 460 sequential files. For each file in input we should have one file in output after cleasing.
Does qualitystage has the template job facility which is available in enterprise edition parallel jobs. All the files have same metadata. Can one template job of Quality stage handle different input files and different cleaning rules dynamically.

How difficult it will be to create such jobs. How much efforts do you estimate to do that roughly?


Thanks
Mukund

Posted: Fri May 27, 2005 12:37 pm
by PilotBaha
Besides hiring a consultant that knows what to do and how to do it, I'd recommend making things as homogeneous as possible <b> before </b> you feed the data to QS. Gathering data from different layouts and bringing them in to QS is an easy task for DataStage. Use that to the most of your abilities.
(Try to insert a field in a QS data structure .. that's what the interns are fore)
Besides all this, I can recommed getting a good consultant who can understand the products and your business needs..

Posted: Sun May 29, 2005 5:57 pm
by vmcburney
Since all your files have the same columns you just write a parallel job that processes files using a file mask, or pass in the different file names as a job parameter, and process them with the one datastage job and the one plugin qualitystage job. Dynamic rules depends entirely on what rules you need. It is possible to set rules at run time via DataStage job parameters that prevent certain changes from occuring, or carry out certain string substitutions within DataStage prior to cleansing.

If your sequential files are small and can be processed quickly I would consider multiple instance parallel and QS jobs and run multiple copies of them. Run each instance on a single node and allocate jobs to different nodes to spread the load around. If they are large sequential files then stick to parallel nodes per job.