Page 1 of 1

DS job load validation

Posted: Mon Jan 09, 2012 10:54 am
by vskr72
We have a lot of jobs that load data from flat file to an oracle table. Each src file also has a control file that contain a count of the records. After the data is loaded to the table, I need to validate if the control file count and records that are loaded is same. I know this is a pretty common scenario. But, I am trying to get some inputs on what is the best approach for this. Should I create a DS job or Unix commands or is there any other better approach. Thank you.

Posted: Mon Jan 09, 2012 12:05 pm
by pandeesh
You can have the script in after job subroutine for validating this.

Posted: Mon Jan 09, 2012 1:58 pm
by chulett
Suggestion: If you are going to post a solution, you should explain the solution that you are proposing. For example, what exactly would this script do to validate the load?

Question back to vskr72: Will you be logging this audit information anywhere or just simply need to check it? What should happen should this chek fail?

Posted: Mon Jan 09, 2012 3:36 pm
by ray.wurlod
Download ETLstats from Kim Duke's website and adapt it to your specific needs. Indeed, ETLstats may suit your needs "out of the box".

Posted: Tue Jan 10, 2012 11:07 pm
by vskr72
If at any point the validation fails, then we just get notified by email. We will take it up to the source people

Posted: Tue Jan 10, 2012 11:44 pm
by pandeesh
Fine.

a)Have the load job.

b)Then, Have the validation job in which check whether

Code: Select all

wc -l 
is in equal to Link Count(This can be done easily DSGetLinkInfo).
Have this as a constraint in transformer and abort the job in transformer accordingly.

c)Getting number of records loaded in table is trickier part.
We can't say the all the records passed through that link are got loaded.
The records can be rejected because of the constraints in table also. So have the logic for fetching count from table.

Thanks

Posted: Wed Jan 11, 2012 1:16 am
by kmsekhar
I think you can perform the validation before loading into Target Oracle:

1.Get the count of source file

Code: Select all

$ awk '{n++} END {print n}' SrcFile.txt 
2.Get The count of Control file

3.Is there any mismatch in count then pass the exit code as non-zero in the script. so that you can abort the job and send an e-mail notification.

4.If the count matches across the Source & Control then you can check the count of Target.
To perform this you can query the count in script itself
Or
Create a job which has to contain only one column which can hold the count of the Target Table and store into an temp file.
Now can compare the source count and Target count.

P.S. Make sure to capture the Target count before executing the above job so that you can deduct the accumulated count.
Ex: Before Execution Record Count 1000
After Execution 1500
So In the current run you processed ie., 1500-100=500

Posted: Wed Jan 11, 2012 4:24 am
by kmsekhar
If you want to store the target oracle count in unix variable:

Code: Select all

#!/bin/sh
REC_COUNT=`sqlplus -silent user/password@instance <<END
set pagesize 0 feedback off verify off heading off echo off
select count(*) from tgttable;
exit;
END`
if [ -z "$REC_COUNT" ]; then
  echo "No rows returned from tgttable"
  exit 0
else
  echo $REC_COUNT
fi