DS job load validation

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
vskr72
Premium Member
Premium Member
Posts: 128
Joined: Wed Apr 28, 2004 9:36 pm

DS job load validation

Post by vskr72 »

We have a lot of jobs that load data from flat file to an oracle table. Each src file also has a control file that contain a count of the records. After the data is loaded to the table, I need to validate if the control file count and records that are loaded is same. I know this is a pretty common scenario. But, I am trying to get some inputs on what is the best approach for this. Should I create a DS job or Unix commands or is there any other better approach. Thank you.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

You can have the script in after job subroutine for validating this.
pandeeswaran
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Suggestion: If you are going to post a solution, you should explain the solution that you are proposing. For example, what exactly would this script do to validate the load?

Question back to vskr72: Will you be logging this audit information anywhere or just simply need to check it? What should happen should this chek fail?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Download ETLstats from Kim Duke's website and adapt it to your specific needs. Indeed, ETLstats may suit your needs "out of the box".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vskr72
Premium Member
Premium Member
Posts: 128
Joined: Wed Apr 28, 2004 9:36 pm

Post by vskr72 »

If at any point the validation fails, then we just get notified by email. We will take it up to the source people
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

Fine.

a)Have the load job.

b)Then, Have the validation job in which check whether

Code: Select all

wc -l 
is in equal to Link Count(This can be done easily DSGetLinkInfo).
Have this as a constraint in transformer and abort the job in transformer accordingly.

c)Getting number of records loaded in table is trickier part.
We can't say the all the records passed through that link are got loaded.
The records can be rejected because of the constraints in table also. So have the logic for fetching count from table.

Thanks
pandeeswaran
kmsekhar
Premium Member
Premium Member
Posts: 58
Joined: Fri Apr 16, 2010 12:58 pm
Location: Chn

Post by kmsekhar »

I think you can perform the validation before loading into Target Oracle:

1.Get the count of source file

Code: Select all

$ awk '{n++} END {print n}' SrcFile.txt 
2.Get The count of Control file

3.Is there any mismatch in count then pass the exit code as non-zero in the script. so that you can abort the job and send an e-mail notification.

4.If the count matches across the Source & Control then you can check the count of Target.
To perform this you can query the count in script itself
Or
Create a job which has to contain only one column which can hold the count of the Target Table and store into an temp file.
Now can compare the source count and Target count.

P.S. Make sure to capture the Target count before executing the above job so that you can deduct the accumulated count.
Ex: Before Execution Record Count 1000
After Execution 1500
So In the current run you processed ie., 1500-100=500
kmsekhar
Premium Member
Premium Member
Posts: 58
Joined: Fri Apr 16, 2010 12:58 pm
Location: Chn

Post by kmsekhar »

If you want to store the target oracle count in unix variable:

Code: Select all

#!/bin/sh
REC_COUNT=`sqlplus -silent user/password@instance <<END
set pagesize 0 feedback off verify off heading off echo off
select count(*) from tgttable;
exit;
END`
if [ -z "$REC_COUNT" ]; then
  echo "No rows returned from tgttable"
  exit 0
else
  echo $REC_COUNT
fi
Post Reply