High volume of reference data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
synsog
Premium Member
Premium Member
Posts: 232
Joined: Sun Aug 01, 2010 11:01 pm
Location: Pune

High volume of reference data

Post by synsog »

Hi All,

We are getting high volume of reference data is one of my job and this job is taking 40 min time for complete execution.

On daily basis they are adding reference data and at this point of time we are getting more than 50 million records.

We have tried few options but the issue is still same.

1. tried with join and merge instead of look up
2. tried with recordset dataset as reference data instead of refering against the table directly
3. tried with creating indexes on the table


And also there is no limit for this reference data there is a chance of getting more volume in future.


Anyone please suggest if you have any idea on this.


Thanks
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Questions:

1) What's the length of the reference data records (sum of length of columns)?
2) How many jobs use this reference data each day? Just this one job or multiple jobs?
3) How long does it take to extract the data from the table and write it to a dataset?
4) What database stage do you use to extract the reference data and do you extract the data in sequential or parallel mode?
5) For Join or Merge, do you include a Sort stage for the reference data or do you just use the input link sort option on the Join/Merge stage?
6) When is the reference table updated in relation to when you use the data? Hours/Minutes, Overnight update versus daytime usage?
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply