Quality job running forever for large chunks of data

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
RAJEEV KATTA
Participant
Posts: 103
Joined: Wed Jul 06, 2005 12:29 am

Quality job running forever for large chunks of data

Post by RAJEEV KATTA »

I created one od the Qualitystage job which uses unduplicate match.If the job is run for small chunks of data like 1000 records it works well but if I run the same job for large chunks of data like 10000 it keeps on running forever hours.Is there any workaround or solution by which we can make it run faster.Do we have any patch which Ibm provides it.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

There are design good practices to make a QS job run faster

Would you mind telling us more about your job, so poster can provide better suggestions?

Below are the most common factors that you could make a QS match (any type) job run slower with bigger data:

- System resources ....
- Standardizing the data in same job
- Generating data frequency in same job
- Having a blocking strategy that produce big blocks
- Having a match specs with not meaninful (redundant) passes
- Using a Config File with a lot of nodes
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
RAJEEV KATTA
Participant
Posts: 103
Joined: Wed Jul 06, 2005 12:29 am

Post by RAJEEV KATTA »

I am blocking on two columns which are in a good format.I am also using data frequency stage because it needs to be passed as an input to Unduplicate stage for getting duplicates with an option being used as Unduplicate Independent in match specification.The no of nodes is also one. The same job runs perfently for 1000 records but for 2000 records it doesn't work.Even if I break that into 1000 twice and process it twice it works but with total of 2000 records it doesn't work.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Rajeev,

Are you generating the frequency info in the same job? If yes then generating the frequency in a previous job will do the trick

If not, please post your job design
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
RAJEEV KATTA
Participant
Posts: 103
Joined: Wed Jul 06, 2005 12:29 am

Post by RAJEEV KATTA »

Its working if I get the frequency in a job into text file and then use that in the Unduplicate Match job as input to it.Thats a good thought.

But I am running into a strange problem,if I copy all the stages from Unduplicate match job and remove stages not required & capture frequency it works but if I try to create a new job with just frequency stage its reading all the records but writing zero records which is very wierd beahviour.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Great!

If the new job do not generate any frequency data, is either that the Maximun Frequency Entry value is empty in the Match Frequency stage or the input columns are not propagated to the output columns (See the mapping page in the output link's properties)

Or maybe two different file names?

Please post your job design that will save a lot of back and forth
Last edited by JRodriguez on Sun Aug 30, 2009 6:48 pm, edited 1 time in total.
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
RAJEEV KATTA
Participant
Posts: 103
Joined: Wed Jul 06, 2005 12:29 am

Post by RAJEEV KATTA »

All the below options are correct I checked it.I gave the max frequency as 1000,mapped the input fields to output and the file names are correct.


Seq----> Transformer--->Frequency -----> Seq File.

When you say post your job design do you mean the high level design or dsx.If it is high level design then it is as above graph or if it is dsx then I am not sure how do I do that out here.
RAJEEV KATTA
Participant
Posts: 103
Joined: Wed Jul 06, 2005 12:29 am

Post by RAJEEV KATTA »

All the below options are correct I checked it.I gave the max frequency as 1000,mapped the input fields to output and the file names are correct.


Seq----> Transformer--->Frequency -----> Seq File.

When you say post your job design do you mean the high level design or dsx.If it is high level design then it is as above graph or if it is dsx then I am not sure how do I do that out here.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Rajeev,

Are you passing the columns from the transformer stage? If yes then
set a quick test to find out if the Match Frequency stage is the cause just removed the Match Frequency Stage and see if the job write to the target sequential file ..
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
RAJEEV KATTA
Participant
Posts: 103
Joined: Wed Jul 06, 2005 12:29 am

Post by RAJEEV KATTA »

I removed the frequency stage & it writes to the file.In match frequency stage when I check dont use match specifcation it works but when I specify the match specifcation stage it doesn't work.In the match specifcation stage I just blocked for one column.I tested it with Unduplicate match and it works well.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Rajeev,

Well now I can explain why :P :

If you check "don't use match specifcation" the stage will generates frequency data for all columns, if you uncheck the option, a Match specification must be provided and the Match Frequency stage will generate frequency data only for those columns used in the Match Commands. In your case you have a Match Spec without match commands, you are using only blocking columns as you mentioned in your initial post, so the stage did not generate any output data
Last edited by JRodriguez on Wed Aug 26, 2009 11:11 am, edited 3 times in total.
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
RAJEEV KATTA
Participant
Posts: 103
Joined: Wed Jul 06, 2005 12:29 am

Post by RAJEEV KATTA »

Here is more info from log on it

Using Match specifcation
================

S_Customer_Intput,0: Import complete; 27167 records imported successfully, 0 rejected.

Match_Frequency_85,0: Field export complete. 1 records converted successfully, 0 rejected.

Match_Frequency_85,0: Field import complete; 1 records converted successfully, 0 rejected.

Match_Frequency_85,0: 1 input records read; 1 kept

Sequential_File_68,0: Export complete; 0 records exported successfully, 0 rejected.

Using Dont use Match Specifcation
=======================

S_Customer_Intput,0: Import complete; 27167 records imported successfully, 0 rejected.

Match_Frequency_85,0: Field export complete. 341058 records converted successfully, 0 rejected.

Match_Frequency_85,0: Field import complete; 341058 records converted successfully, 0 rejected.

Match_Frequency_85,0: 341058 input records read; 3713 kept

Sequential_File_68,0: Export complete; 3675 records exported successfully, 0 rejected.

I am not sure when I use Match specification its saying 1 record exported instead of multiple records.
RAJEEV KATTA
Participant
Posts: 103
Joined: Wed Jul 06, 2005 12:29 am

Post by RAJEEV KATTA »

I got it but going back to our original question,if match frequency is calculated in first job by writing to a file and later using that file in next job as one of the input to Unduplicate match stage.As the file wont get created as match frequency generates zero records how do we do that.Do you think we need to use row generator with zero records and columns being copied from match freq stage as one of the input to Unduplicate match stage which would make the job faster.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Just generate frequency info for all columns ...
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
RAJEEV KATTA
Participant
Posts: 103
Joined: Wed Jul 06, 2005 12:29 am

Post by RAJEEV KATTA »

Cool.

Thanks a lot Julio for all the help and your time.

Appreciate it very much.
Post Reply