Run job based upon date found in target
Moderators: chulett, rschirm, roy
Run job based upon date found in target
Hi All,
I have a Unix file coming once in every two weeks with a header line which contains the process date in it followed by detail lines and trailer line. Now, I have created a table in my target ( Oracle) which stores the date found in the header line of the unix file( through a server job) as a processed date. Now, my requirement is, whenever a unix file comes, I need to check the date in the header line and see if that file has already been processed( by checking the processed date inthe Oracle table). If the date is already present in the table, it implies that the file has been processed and I need to skip the file or stop the job there, If not, i.e. if the date is not found in the Oracle table, it means that the file has not been processed and I need to continue with my job ( i.e. pulling the data and transforming and loading). If anyone could throw some light on how I could achieve this, it would be great.
One way I figured out was, run a job which reads the date from the unix file and lookup against the dates in the oracle table with two output links coming out from Transformer stage.
1. First Outputlink---->Input.Link.NOTFOUND <> @TRUE
2. Second Outputlink --> Input.Link.NOTFOUND=@TRUE.
Here, I would declare my second output link as Reject link. Now, if this job finishes with warnings, it means that a new row has been inserted which implies that the file has not been processed and I can continue with my job sequence based on this Job's JOBSTATUS. If, it finished OK, it means that the date in the unix file is already found in the target and I can just skip the job. Now, to achieve this, I believe I have to use a Routine Activity stage in the Job Sequence after this JOB Activity.
Am I in the right direction? It would be great if any one could provide me with any inputs. Also, If I am right, Which Routine activity would be a good one to use here. Do I need to write a routine and use in the sequence. Please let me know.
If I am not clear, I am ready to explain it again. Please ask me again.
Thanks
Kris
I have a Unix file coming once in every two weeks with a header line which contains the process date in it followed by detail lines and trailer line. Now, I have created a table in my target ( Oracle) which stores the date found in the header line of the unix file( through a server job) as a processed date. Now, my requirement is, whenever a unix file comes, I need to check the date in the header line and see if that file has already been processed( by checking the processed date inthe Oracle table). If the date is already present in the table, it implies that the file has been processed and I need to skip the file or stop the job there, If not, i.e. if the date is not found in the Oracle table, it means that the file has not been processed and I need to continue with my job ( i.e. pulling the data and transforming and loading). If anyone could throw some light on how I could achieve this, it would be great.
One way I figured out was, run a job which reads the date from the unix file and lookup against the dates in the oracle table with two output links coming out from Transformer stage.
1. First Outputlink---->Input.Link.NOTFOUND <> @TRUE
2. Second Outputlink --> Input.Link.NOTFOUND=@TRUE.
Here, I would declare my second output link as Reject link. Now, if this job finishes with warnings, it means that a new row has been inserted which implies that the file has not been processed and I can continue with my job sequence based on this Job's JOBSTATUS. If, it finished OK, it means that the date in the unix file is already found in the target and I can just skip the job. Now, to achieve this, I believe I have to use a Routine Activity stage in the Job Sequence after this JOB Activity.
Am I in the right direction? It would be great if any one could provide me with any inputs. Also, If I am right, Which Routine activity would be a good one to use here. Do I need to write a routine and use in the sequence. Please let me know.
If I am not clear, I am ready to explain it again. Please ask me again.
Thanks
Kris
Hi,
One way to do it might be utilizing a user defined routine (using DSGetLinkInfo routine to see how many rows passed a link in the job) in a User variable stage after your job to determine did you find such a date in your DB lookup or not.
Afer that use a Nested Condition stage to determine if you process the file or not.
This action may need to be wraped in a loop built on a file-list attained via an OS command-line's output or any similar solution.
another way, to pop the processed/not processed date value, would be poping the value in a user status in a after routine of your job performing the lookup.
There are probably other ways to do this as well,
bare in mind you need to explore them and choose the one most suitable to you.
IHTH,
One way to do it might be utilizing a user defined routine (using DSGetLinkInfo routine to see how many rows passed a link in the job) in a User variable stage after your job to determine did you find such a date in your DB lookup or not.
Afer that use a Nested Condition stage to determine if you process the file or not.
This action may need to be wraped in a loop built on a file-list attained via an OS command-line's output or any similar solution.
another way, to pop the processed/not processed date value, would be poping the value in a user status in a after routine of your job performing the lookup.
There are probably other ways to do this as well,
bare in mind you need to explore them and choose the one most suitable to you.
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
-
- Participant
- Posts: 232
- Joined: Fri Sep 30, 2005 4:52 am
- Contact:
Thanks Anupam for the reply. I tried doing that initially but was not able to achieve what I wanted. If the date is found in the target table..the only way to skip this unix file from being processed is to abort the job( as far as I think). I am getting confused within that loop. Any guidance would be of great help.
My job looks like this
Now, I need to look up that process date against the target table and then decide if I need to run the job or not. So, one way is to split the above job into two where I read only the first row in a seperate job and then lookup against the process dates in the target table. After that, based upon the result I need to run my next job i.e processing detail lines and trailer lines.
Thanks
Kris.
My job looks like this
Code: Select all
Unixfile-------@INROWNUM=1-----------headerline=processdate
|
---------@INROWNUM>1---------detaillines= data
|
------@INROWNUM>1----AGGREGATOR------trailer line
Thanks
Kris.
This is something very similar to what I've done in the past and how I'd suggest you approach it. As you've noted, 'validate' the file by first getting the process date in a separate job and seeing if it exists in your control table. Your job control can then decide, based on the results of that test, how exactly to process that file - load it or reject it.kris007 wrote:Now, I need to look up that process date against the target table and then decide if I need to run the job or not. So, one way is to split the above job into two where I read only the first row in a seperate job and then lookup against the process dates in the target table. After that, based upon the result I need to run my next job i.e processing detail lines and trailer lines.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 232
- Joined: Fri Sep 30, 2005 4:52 am
- Contact:
I think better to go with 2 jobs as you dont want run main job
or else you can go like :
In stage variable :
VarStage : If @INROWNUM=1 AND headerline=processdate THEN 1 ELSE 0
Unixfile----- Constraint :VarSatage=0 and @INROWNUM<>1 -----> Target
(to process rest of row if headerline not found in processdate)
Thanks,
Anupam
or else you can go like :
In stage variable :
VarStage : If @INROWNUM=1 AND headerline=processdate THEN 1 ELSE 0
Unixfile----- Constraint :VarSatage=0 and @INROWNUM<>1 -----> Target
(to process rest of row if headerline not found in processdate)
Thanks,
Anupam
Craig,
If I understand you correctly, I need to run a job to look for the date in the control table. Now, as I mentioned earlier in the post I would have two output links
Now, if this Job finishes with status 2 then run the job sequence and if the Job finishes with status 1 then send a mail saying that file has been processed already. Am I in right direction?
Thanks
Kris
If I understand you correctly, I need to run a job to look for the date in the control table. Now, as I mentioned earlier in the post I would have two output links
Code: Select all
1. First Outputlink---->Input.Link.NOTFOUND <> @TRUE
2. Second Outputlink --> Input.Link.NOTFOUND=@TRUE.
Thanks
Kris
That would work... as long as you explicitly posted a warning to the job's log when the lookup succeeded.
Another method would be to check the link row counts post job via DSGetLinkInfo (which is how I've done it in the past) and make your decision based on that.
Another method would be to check the link row counts post job via DSGetLinkInfo (which is how I've done it in the past) and make your decision based on that.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Yeah, Even one of the suggestions posted by Roy earlier in the post talked about using the DSGetLinkInfo. But, I never got a chance so far to use that and am kind of new to it. Hence, those hesitations. I will give that a shot too. If possible, it would be great if you could provide any guidance on how to use DSGetLinkinfo function post job.
Thanks
Kris.
Thanks
Kris.
Here's one approach - an example of a custom routine that takes in the three values you would need to call the function. This can be used in a Routine Activity stage and the output from this can drive a downstream decision:
Pass in the Job Name, Stage Name and Link Name to check. Negative results are errors, while a positive or zero return code will be a link row count.
Code: Select all
$IFNDEF JOBCONTROL.H
$INCLUDE DSINCLUDE JOBCONTROL.H
$ENDIF
rJobHandle = DSAttachJob(JobName, DSJ.ERRNONE)
If NOT(rJobHandle) Then
Ans = -99
End Else
Ans = DSGetLinkInfo(rJobHandle, StageName, LinkName, DSJ.LINKROWCOUNT)
End
DetachResult = DSDetachJob(rJobHandle)
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers