Starting of Data Stage Job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Pavan_Yelugula
Premium Member
Premium Member
Posts: 133
Joined: Tue Nov 23, 2004 11:24 pm
Location: India

Starting of Data Stage Job

Post by Pavan_Yelugula »

Hi All
We are scheduling a Data Stage Job to be started every day at 7PM which is getting me the incremental load from a database.
My select query is as follows
select * from XX where datetime >= (sysdate-1)

Records are being pumped in to this table every second.

Is there a chance i will miss out some records because the data stage server takes a few seconds based on the server utilization for starting my job or will the job start for sure every day at 7PM and i will not miss any records

Thanks
Pavan
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

Re: Starting of Data Stage Job

Post by Ultramundane »

Pavan_Yelugula wrote:Hi All
We are scheduling a Data Stage Job to be started every day at 7PM which is getting me the incremental load from a database.
My select query is as follows
select * from XX where datetime >= (sysdate-1)

Records are being pumped in to this table every second.

Is there a chance i will miss out some records because the data stage server takes a few seconds based on the server utilization for starting my job or will the job start for sure every day at 7PM and i will not miss any records

Thanks
Pavan
Yes
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

We have a similar process that pulls changes from a source database every morning at 1am. To guarantee that we don't miss anything, our where clause intentionally overlaps with the last period:

Code: Select all

select <whatever>
from <source table>
where update_date >= CURRENT DATE - 2 days
;
We do it by date, but the same applies to using a timestamp. If you are running at 7pm and want everything from yesterday at 7pm to today, then look at yesterday from 6pm to today at 7pm instead. Overlap by an hour.

You will need to dedup your input data against your target table, or eliminate any matches from the input to your target, but that should eliminate the possibility of missing anything.

Hope this helps.

Brad.
It is not that I am addicted to coffee, it's just that I need it to survive.
Post Reply