Total Records processed

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dspxlearn
Premium Member
Premium Member
Posts: 291
Joined: Sat Sep 10, 2005 1:26 am

Total Records processed

Post by dspxlearn »

We have a requirement in which we have to capture the total number of records pulled from the source, total number of records loaded and number of records rejected. We can do this using the "dsjob -lstages" from the scripts by using the linknames and stage names and putting it in a loop to get the count at all the links.

But,our job designs are not unique and the number of stages used will not be same across the jobs. So. we have to come up with a generic method to use it for all the jobs and get the no. of records pulled from the source,loaded to the target and rejected.
Can anyone please give me a hint how this can be done? :?:
Thanks and Regards!!
dspxlearn
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

So modify you method.
Have consistent Stage / Link names atleast.
So that you can loop through the avaialble links and search for "Input", "Output", "Reject" (for example) and fetch the required data.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
dspxlearn
Premium Member
Premium Member
Posts: 291
Joined: Sat Sep 10, 2005 1:26 am

Post by dspxlearn »

Thanks Kumar-
Right. As you said, i have to use names something like 'SRC_RECS' for input links, 'TGT_RECS' for output links and 'REJ_RECS' for reject link.

In this way, after using these naming conventing across jobs, i need to just use the link names to capture the record count at these 3 different links.
As of now, i will implement this. But, let me know if there are any other methods.
Thanks and Regards!!
dspxlearn
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

If you are not even sure about number of stages available in each job, this is the only possible solution that I could think of as of now.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
AmeyJoshi14
Participant
Posts: 334
Joined: Fri Dec 01, 2006 5:17 am
Location: Texas

Post by AmeyJoshi14 »

For the same issue we have opted for using the shell scripts. :)
I know as you said you do not want to use shell script but still :!: i would like to give the script which might help you out. :shock:

Code: Select all

#!/bin/ksh
# This script will have to parameters porject name and jobname
project=$1
jobname=$2
CURRDIR=pathname
. $DSHOME/dsenv
#All the stage names used in the job is stored into stage.txt file
$DSHOME/bin/dsjob -lstages $project $jobname 2>/dev/null > $CURRDIR/stage.txt
cat $CURRDIR/stage.txt |while read line
do
      stagename=`echo $line`
      rowcount=`dsjob -stageinfo $project $jobname $stagename 2>/dev/null | grep "In Row Number" |cut -d':' -f2|awk '{printf $1}'`
      #The above command will give you the row count for all the stages in the particular job
      echo "The rowcount for stage $stagename=$rowcount  
done       
Since this script takes two parameters you can run this script for any jobs in particular project. :wink:
Hope this will help you!
http://findingjobsindatastage.blogspot.com/
Theory is when you know all and nothing works. Practice is when all works and nobody knows why. In this case we have put together theory and practice: nothing works. and nobody knows why! (Albert Einstein)
dspxlearn
Premium Member
Premium Member
Posts: 291
Joined: Sat Sep 10, 2005 1:26 am

Post by dspxlearn »

Thanks AmeyJoshi14-

I never said i didn't like to use shell :) . Infact, i am writing the similar script as you posted. I fact, shell scripts are more convenient for these kind of requirements rather than using DSroutines.

Thanks for your time.
Thanks and Regards!!
dspxlearn
Post Reply