Run job based upon date found in target

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Run job based upon date found in target

Post by kris007 »

Hi All,

I have a Unix file coming once in every two weeks with a header line which contains the process date in it followed by detail lines and trailer line. Now, I have created a table in my target ( Oracle) which stores the date found in the header line of the unix file( through a server job) as a processed date. Now, my requirement is, whenever a unix file comes, I need to check the date in the header line and see if that file has already been processed( by checking the processed date inthe Oracle table). If the date is already present in the table, it implies that the file has been processed and I need to skip the file or stop the job there, If not, i.e. if the date is not found in the Oracle table, it means that the file has not been processed and I need to continue with my job ( i.e. pulling the data and transforming and loading). If anyone could throw some light on how I could achieve this, it would be great.

One way I figured out was, run a job which reads the date from the unix file and lookup against the dates in the oracle table with two output links coming out from Transformer stage.

1. First Outputlink---->Input.Link.NOTFOUND <> @TRUE
2. Second Outputlink --> Input.Link.NOTFOUND=@TRUE.

Here, I would declare my second output link as Reject link. Now, if this job finishes with warnings, it means that a new row has been inserted which implies that the file has not been processed and I can continue with my job sequence based on this Job's JOBSTATUS. If, it finished OK, it means that the date in the unix file is already found in the target and I can just skip the job. Now, to achieve this, I believe I have to use a Routine Activity stage in the Job Sequence after this JOB Activity.

Am I in the right direction? It would be great if any one could provide me with any inputs. Also, If I am right, Which Routine activity would be a good one to use here. Do I need to write a routine and use in the sequence. Please let me know.

If I am not clear, I am ready to explain it again. Please ask me again.

Thanks
Kris
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
One way to do it might be utilizing a user defined routine (using DSGetLinkInfo routine to see how many rows passed a link in the job) in a User variable stage after your job to determine did you find such a date in your DB lookup or not.
Afer that use a Nested Condition stage to determine if you process the file or not.
This action may need to be wraped in a loop built on a file-list attained via an OS command-line's output or any similar solution.

another way, to pop the processed/not processed date value, would be poping the value in a user status in a after routine of your job performing the lookup.

There are probably other ways to do this as well,
bare in mind you need to explore them and choose the one most suitable to you.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Roy,

I got the gist of your explanation but still not able to get the whole clear picture. It would be great if you could elaborate a little more.

Thanks
Kris
sb_akarmarkar
Participant
Posts: 232
Joined: Fri Sep 30, 2005 4:52 am
Contact:

Post by sb_akarmarkar »

Hey Kris,

I think you can try to get solution using @INROWNUM
In stage variable use @INROWNUM=1 and Lookup comparsion is successful which will verify your file header column value then process other rows else dont process row by providing condition in constraint.


Thanks,
Anupam
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Thanks Anupam for the reply. I tried doing that initially but was not able to achieve what I wanted. If the date is found in the target table..the only way to skip this unix file from being processed is to abort the job( as far as I think). I am getting confused within that loop. Any guidance would be of great help.
My job looks like this

Code: Select all


Unixfile-------@INROWNUM=1-----------headerline=processdate
         |
         ---------@INROWNUM>1---------detaillines= data
         |
          ------@INROWNUM>1----AGGREGATOR------trailer line 
Now, I need to look up that process date against the target table and then decide if I need to run the job or not. So, one way is to split the above job into two where I read only the first row in a seperate job and then lookup against the process dates in the target table. After that, based upon the result I need to run my next job i.e processing detail lines and trailer lines.

Thanks
Kris.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

kris007 wrote:Now, I need to look up that process date against the target table and then decide if I need to run the job or not. So, one way is to split the above job into two where I read only the first row in a seperate job and then lookup against the process dates in the target table. After that, based upon the result I need to run my next job i.e processing detail lines and trailer lines.
This is something very similar to what I've done in the past and how I'd suggest you approach it. As you've noted, 'validate' the file by first getting the process date in a separate job and seeing if it exists in your control table. Your job control can then decide, based on the results of that test, how exactly to process that file - load it or reject it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sb_akarmarkar
Participant
Posts: 232
Joined: Fri Sep 30, 2005 4:52 am
Contact:

Post by sb_akarmarkar »

I think better to go with 2 jobs as you dont want run main job
or else you can go like :

In stage variable :

VarStage : If @INROWNUM=1 AND headerline=processdate THEN 1 ELSE 0


Unixfile----- Constraint :VarSatage=0 and @INROWNUM<>1 -----> Target
(to process rest of row if headerline not found in processdate)


Thanks,
Anupam
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Craig,

If I understand you correctly, I need to run a job to look for the date in the control table. Now, as I mentioned earlier in the post I would have two output links

Code: Select all

1. First Outputlink---->Input.Link.NOTFOUND <> @TRUE 
2. Second Outputlink --> Input.Link.NOTFOUND=@TRUE. 
Now, if this Job finishes with status 2 then run the job sequence and if the Job finishes with status 1 then send a mail saying that file has been processed already. Am I in right direction?

Thanks
Kris
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That would work... as long as you explicitly posted a warning to the job's log when the lookup succeeded.

Another method would be to check the link row counts post job via DSGetLinkInfo (which is how I've done it in the past) and make your decision based on that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Yeah, Even one of the suggestions posted by Roy earlier in the post talked about using the DSGetLinkInfo. But, I never got a chance so far to use that and am kind of new to it. Hence, those hesitations. I will give that a shot too. If possible, it would be great if you could provide any guidance on how to use DSGetLinkinfo function post job.

Thanks
Kris.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Here's one approach - an example of a custom routine that takes in the three values you would need to call the function. This can be used in a Routine Activity stage and the output from this can drive a downstream decision:

Code: Select all

$IFNDEF JOBCONTROL.H 
   $INCLUDE DSINCLUDE JOBCONTROL.H 
$ENDIF

rJobHandle = DSAttachJob(JobName, DSJ.ERRNONE)

If NOT(rJobHandle) Then
   Ans = -99
End Else
   Ans = DSGetLinkInfo(rJobHandle, StageName, LinkName, DSJ.LINKROWCOUNT)
End

DetachResult = DSDetachJob(rJobHandle)
Pass in the Job Name, Stage Name and Link Name to check. Negative results are errors, while a positive or zero return code will be a link row count.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply