Page 1 of 1

Quality job running forever for large chunks of data

Posted: Fri Aug 21, 2009 4:41 pm
by RAJEEV KATTA
I created one od the Qualitystage job which uses unduplicate match.If the job is run for small chunks of data like 1000 records it works well but if I run the same job for large chunks of data like 10000 it keeps on running forever hours.Is there any workaround or solution by which we can make it run faster.Do we have any patch which Ibm provides it.

Posted: Mon Aug 24, 2009 10:30 am
by JRodriguez
There are design good practices to make a QS job run faster

Would you mind telling us more about your job, so poster can provide better suggestions?

Below are the most common factors that you could make a QS match (any type) job run slower with bigger data:

- System resources ....
- Standardizing the data in same job
- Generating data frequency in same job
- Having a blocking strategy that produce big blocks
- Having a match specs with not meaninful (redundant) passes
- Using a Config File with a lot of nodes

Posted: Mon Aug 24, 2009 4:02 pm
by RAJEEV KATTA
I am blocking on two columns which are in a good format.I am also using data frequency stage because it needs to be passed as an input to Unduplicate stage for getting duplicates with an option being used as Unduplicate Independent in match specification.The no of nodes is also one. The same job runs perfently for 1000 records but for 2000 records it doesn't work.Even if I break that into 1000 twice and process it twice it works but with total of 2000 records it doesn't work.

Posted: Tue Aug 25, 2009 9:31 am
by JRodriguez
Rajeev,

Are you generating the frequency info in the same job? If yes then generating the frequency in a previous job will do the trick

If not, please post your job design

Posted: Tue Aug 25, 2009 11:47 am
by RAJEEV KATTA
Its working if I get the frequency in a job into text file and then use that in the Unduplicate Match job as input to it.Thats a good thought.

But I am running into a strange problem,if I copy all the stages from Unduplicate match job and remove stages not required & capture frequency it works but if I try to create a new job with just frequency stage its reading all the records but writing zero records which is very wierd beahviour.

Posted: Tue Aug 25, 2009 12:36 pm
by JRodriguez
Great!

If the new job do not generate any frequency data, is either that the Maximun Frequency Entry value is empty in the Match Frequency stage or the input columns are not propagated to the output columns (See the mapping page in the output link's properties)

Or maybe two different file names?

Please post your job design that will save a lot of back and forth

Posted: Tue Aug 25, 2009 5:13 pm
by RAJEEV KATTA
All the below options are correct I checked it.I gave the max frequency as 1000,mapped the input fields to output and the file names are correct.


Seq----> Transformer--->Frequency -----> Seq File.

When you say post your job design do you mean the high level design or dsx.If it is high level design then it is as above graph or if it is dsx then I am not sure how do I do that out here.

Posted: Tue Aug 25, 2009 5:15 pm
by RAJEEV KATTA
All the below options are correct I checked it.I gave the max frequency as 1000,mapped the input fields to output and the file names are correct.


Seq----> Transformer--->Frequency -----> Seq File.

When you say post your job design do you mean the high level design or dsx.If it is high level design then it is as above graph or if it is dsx then I am not sure how do I do that out here.

Posted: Wed Aug 26, 2009 6:24 am
by JRodriguez
Rajeev,

Are you passing the columns from the transformer stage? If yes then
set a quick test to find out if the Match Frequency stage is the cause just removed the Match Frequency Stage and see if the job write to the target sequential file ..

Posted: Wed Aug 26, 2009 10:18 am
by RAJEEV KATTA
I removed the frequency stage & it writes to the file.In match frequency stage when I check dont use match specifcation it works but when I specify the match specifcation stage it doesn't work.In the match specifcation stage I just blocked for one column.I tested it with Unduplicate match and it works well.

Posted: Wed Aug 26, 2009 11:07 am
by JRodriguez
Rajeev,

Well now I can explain why :P :

If you check "don't use match specifcation" the stage will generates frequency data for all columns, if you uncheck the option, a Match specification must be provided and the Match Frequency stage will generate frequency data only for those columns used in the Match Commands. In your case you have a Match Spec without match commands, you are using only blocking columns as you mentioned in your initial post, so the stage did not generate any output data

Posted: Wed Aug 26, 2009 11:08 am
by RAJEEV KATTA
Here is more info from log on it

Using Match specifcation
================

S_Customer_Intput,0: Import complete; 27167 records imported successfully, 0 rejected.

Match_Frequency_85,0: Field export complete. 1 records converted successfully, 0 rejected.

Match_Frequency_85,0: Field import complete; 1 records converted successfully, 0 rejected.

Match_Frequency_85,0: 1 input records read; 1 kept

Sequential_File_68,0: Export complete; 0 records exported successfully, 0 rejected.

Using Dont use Match Specifcation
=======================

S_Customer_Intput,0: Import complete; 27167 records imported successfully, 0 rejected.

Match_Frequency_85,0: Field export complete. 341058 records converted successfully, 0 rejected.

Match_Frequency_85,0: Field import complete; 341058 records converted successfully, 0 rejected.

Match_Frequency_85,0: 341058 input records read; 3713 kept

Sequential_File_68,0: Export complete; 3675 records exported successfully, 0 rejected.

I am not sure when I use Match specification its saying 1 record exported instead of multiple records.

Posted: Wed Aug 26, 2009 11:18 am
by RAJEEV KATTA
I got it but going back to our original question,if match frequency is calculated in first job by writing to a file and later using that file in next job as one of the input to Unduplicate match stage.As the file wont get created as match frequency generates zero records how do we do that.Do you think we need to use row generator with zero records and columns being copied from match freq stage as one of the input to Unduplicate match stage which would make the job faster.

Posted: Wed Aug 26, 2009 11:55 am
by JRodriguez
Just generate frequency info for all columns ...

Posted: Wed Aug 26, 2009 12:09 pm
by RAJEEV KATTA
Cool.

Thanks a lot Julio for all the help and your time.

Appreciate it very much.