Page 1 of 2

get row count

Posted: Mon Jul 16, 2012 5:06 pm
by bobbysridhar
How to get the row count when the link has 0 input records,
I am using @outrownum to get the row count but when there is no records I am not able to get anything. I want to have a row which says the number of records for that particular link is zero. I need to populate it in audit table at the end, whether there is records or not.
if number of records are 0 then I need to get a row with totalrecords=0 and start time,Jobname etc.

Posted: Mon Jul 16, 2012 5:24 pm
by chulett
You can get all link rows counts after the job completes to populate your audit table and that would include any zero counts as well. Trying to do all that "in job" isn't really possible if you have no input.

Posted: Mon Jul 16, 2012 6:46 pm
by bobbysridhar
In tfm for total records I am using @outrownum fn. then sorting total records in desc and then taking the row @inrownum=1 from output of sort stage.this is working fine when data is there.when there is no rows coming from tfm output then I an not able get total records as 0. I tried @outrownun>=0 as constrain
In tfm but still not getting total records are 0.

Posted: Mon Jul 16, 2012 7:00 pm
by vmcburney
I hate any attempts to track row counts inside a job while it is running. I think this approach is fundamentally flawed as it does not work for links with 0 rows, jobs that abort or are stopped and it can end up with the wrong result if the developer doesn't understand parallel partitioning.

Under the 8.7 release the best way to get link row counts is to turn on the Operations Console job logging and use the JobRunLink table to get row counts for links or the DataLocator table for writes to files and databases.

Under 8.1 and 8.5 you can get job row counts from the job reports that can be generated after each job run. The ETLStats package from Kim Duke is an example of how to generate job reports and read the results. This is trickier than using the Operations Console database because it means reading the results from XML files rather than a database table.

Job audit and job row counting presents a very strong business case for upgrading to 8.7.

Posted: Mon Jul 16, 2012 7:02 pm
by chulett
bobbysridhar wrote:In tfm but still not getting total records are 0.
... and you won't. Zero records in = zero records out. That's why you can't do this in job but should (as I already noted) do it as a separate process after job.

Posted: Mon Jul 16, 2012 7:12 pm
by kkmreddy
I Tried same once with following approach , My gola was to get the count of rows during tht particular run and get this count value in one of the column . So i did as follow ,
1.From Transformer defined a variable COUNTER with values as '1'
2.Aggregate grouping on this COUNTER and using Count function. Since all the rows have counter=1, count function resulted no.of rows processed .

I hope this will help you ...

Posted: Mon Jul 16, 2012 7:18 pm
by bobbysridhar
So,etl stats from Kim duke is the only option left?
Please let me know

Posted: Mon Jul 16, 2012 7:21 pm
by vmcburney
You can use ETLStats or write your own. Under 8.1 and 8.5 you should be able to turn on job reports at the project level - this will create operational metadata in XML files to a reporting directory on your server. You can then use a DataStage job to get the row counts out of those reports. ETLStats provides an example of how to do that.

Posted: Mon Jul 16, 2012 8:22 pm
by bobbysridhar
I am using dsjobrepot after-job-subroutine and getting all job reports to specific directory in txt format. But creating jobname_timestamp.
But not able to club individual job report files into one file,because every time they are getting created with different timestamps in unix.as I am not an expert in unix, i am not able to get the files specific to that sequence and club them to one file.there I got struck.
Your help is really appreciated with regards to this.

Posted: Tue Jul 17, 2012 3:27 am
by ntr
kkmreddy wrote:I Tried same once with following approach , My gola was to get the count of rows during tht particular run and get this count value in one of the column . So i did as follow ,
1.From Transformer defined a variable COUNTER with values as '1'
2.Aggregate grouping on this COUNTER and using Count function. Since all the rows have counter=1, count function resulted no.of rows processed .
can you please explain more?

Posted: Tue Jul 17, 2012 5:57 am
by chulett
bobbysridhar wrote:i am not able to get the files specific to that sequence and club them to one file.there I got struck.
I don't see the need to "club" them together at all. Can you not simply load them all using a File Pattern?

Posted: Tue Jul 17, 2012 7:16 am
by bobbysridhar
can you tell me how to grep them, becuase I am not expert in UNIX and the files are generated with filename_currenttimestamp.
Please help me in grep those files for each sequence run.
I have morethan 10 jobs in the sequence

Posted: Tue Jul 17, 2012 7:31 am
by chulett
Sorry but why do you think you need to 'grep' them? For what purpose? Perhaps if you explained in detail what exactly you are trying to accomplish, we could do a better job of helping you get there.

Posted: Tue Jul 17, 2012 7:49 am
by bobbysridhar
I need to populate audit table.
The columns I need to populate are
start time
end time
elapsed time
reject records
insert record
update records
elapsed time.
For this I am doing the above said.

Posted: Tue Jul 17, 2012 9:32 am
by chulett
Post the full contents of one of the job reports.