incremental extract

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

incremental extract

Post by dsscholar »

Hi all,

Present logic :

Source : Sql server

Source has date column through which incremental extraction happens. First time start_dt = 1-1-1900 and end_dt = sysdate -1. Next time end_dt + 1 will become start_date and sysdate - 1 will be end_dt. This s just to ensure that no record of today should be missed during anytime of run. We introduce another column called difference where start_dt > end_dt then 0 else 1. In sequencer, if 1 start the jobs else dont.. Just to avoid the second run on the same day.


Start_dt,end_dt and difference logic is done in a separate job and passed to sequencer through routine. Is there any possible way to avoid this.

Please verify the logic below and give suggestions :

What if i convert the date column from the source to timestamp and give start_dt as 1-1-1900 00:00:00 and end_dt as current_timestamp. This s in sql server. Please help for syntax if possible to convert to timestamp from date. And in the incremental extraction job itself by adding another column for current_timestamp and replace the file which has 1-1-1900 00:00:00 with current_timestamp,just before loading the staging table. One job will be avoided. Because i want to store the exact timestamp, thats why taking the same job. But for all the records current_timestamp will be there. I can use remove_duplicate stage. Suggest someother way if possible. This way i can get pass records till current_timestamp and am not using another job and storing just start_dt. Correct me if this logic doesnt work.

And as this current_timestamp, i can eliminate the difference logic also because if in the same day the job runs twice means, some records will be picked. but the source will be having records will date value only right. OMG i typed all the way and now found the mistake. So no use in current timestamp. But anyways tell me whether we can use timestamp. Please advise.

How to pass the text file to sequencer other than routine option.

Is it anyway to eliminate the difference option. Is it possible to do it in sequencer itself. If the sequencer ran on that day, it shouldnt run again.But i should be able to override manually also. Because if someone want to run it again for some other reason, we should be able to do that. And if the job aborts in the middle and if the sequencer fails, this logic should not cause any problem.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: incremental extract

Post by SURA »

For one of your qestions:

TimestampFromDateTime() should work.

DS User
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Re: incremental extract

Post by dsscholar »

Please advise.

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: incremental extract

Post by chulett »

dsscholar wrote:Source has date column through which incremental extraction happens. First time start_dt = 1-1-1900 and end_dt = sysdate -1. Next time end_dt + 1 will become start_date and sysdate - 1 will be end_dt. This s just to ensure that no record of today should be missed during anytime of run. We introduce another column called difference where start_dt > end_dt then 0 else 1. In sequencer, if 1 start the jobs else dont.. Just to avoid the second run on the same day.
Not how I would handle something like this but I don't know your data or source systems. I would set a "high water mark" that defaults to your "1-1-1900" date and then select any records from the source where your source date column is greater than your HWM. During the staging process, capture the max(date) in the source and store that as your new HWM. Next run you'll pick up any records loaded after that date and if the jobs are run when no new source records have been added, the jobs will run fine but extract zero records.

Now... is this "date column" an actual date or is it a timestamp? Are you absolutely certain you'll be sourcing from the system while the data will be in a static state? If there's any possibility that your source is still "live" when you are extracting, other precautions need to be made and it can't be as simple as either you or I have posted.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Re: incremental extract

Post by dsscholar »

It wont be static. Thats why we take previous day's records(end_dt = sysdate - 1). And source is a date. Any other logic other than doing lookup with the file containing incremental_date and routine logic. Regarding parameter file logic. In unix, i can copy text file content to parameter file like this.

cat text_file > directory\parameter_file. But how to do this in windows.

Any other logic u have come across for incremental extraction means, please share. My query is, i would like to pass the date value to the source query as parameter. I dont want to do it using routine. i want some other logic..
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: incremental extract

Post by chulett »

dsscholar wrote:cat text_file > directory\parameter_file. But how to do this in windows.
The exact same way as you have the MKS Toolkit installed, but if you want the "Windows way" then just replace cat with type.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Re: incremental extract

Post by dsscholar »

Thanks chulett.

Anyway in which i can avoid the second run of the sequencer by putting a condition in the start of the sequencer itself. I am asking with the capability of the sequencer stages and built in parameters. Do we have any built in variable which will store the last run date of a particular job or sequencer, so that i can check that..

Thanks in advance.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, there are no built-in variables in a Sequence job. You can certainly add such a check, but you'll need to bring the value in to check as a parameter or encapsulate the check in a routine. In either case, based on the result you can then branch to the actually run steps or bypass the run entirely.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply