Facts loading strategy

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
algfr
Participant
Posts: 106
Joined: Fri Sep 09, 2005 7:42 am

Facts loading strategy

Post by algfr »

Hey guys,

I'm loading a 3 300 000 rows table containing invoices.

I have two modes :

1) Full loading
2) Daily Loading

For the daily loading mode I have 2 fields containing the creation date and the update date in the source date. Thus I can filter only records created or updated after the last load.

However, if the job crashes and I do want reseume what is the best ?

1) Delete all rows loaded before it crashed and restart ?
2) Check against exsiting records to see which ones are new ? I like this one but I fear to have to lookup against a 3 million rows table.

What do you suggest ?

Thanks
algfr
Participant
Posts: 106
Joined: Fri Sep 09, 2005 7:42 am

Re: Facts loading strategy

Post by algfr »

algfr wrote:Hey guys,

I'm loading a 3 300 000 rows table containing invoices.

I have two modes :

1) Full loading
2) Daily Loading

For the daily loading mode I have 2 fields containing the creation date and the update date in the source date. Thus I can filter only records created or updated after the last load.

However, if the job crashes and I do want reseume what is the best ?

1) Delete all rows loaded before it crashed and restart ?
2) Check against exsiting records to see which ones are new ? I like this one but I fear to have to lookup against a 3 million rows table.

What do you suggest ?

Thanks
Will try with the CDC
Post Reply