Datastage reports

pkothana · Post by **pkothana** » Wed Oct 15, 2003 7:54 am

I am currently working on Datastage 6.0 with Parallel extender. The job requires to produce report in a text file with the following format stating the number of records processed, number of records rejected ( rejection can be based on business rules) and the number of records passed to the next job in queue. I have also gone through the Data Stage Reporting tool and couldn't find anything which meets my criteria.Any pointers as to how I can implement this in Datastage 6.0 parallel extender job will be highly appriciated.

Thanks in advance for your help

Regards
Pinkesh Kothana
Technical Specialist
Infosys Technologies Ltd.

Amos.Rosmarin · Post by **Amos.Rosmarin** » Wed Oct 15, 2003 8:31 am

Pinkesh ,

There is no built-in mechanism for such statistics in Datastage but there are methods that you can use to write it yourself such as DSGetJobInfo and DSGetLinkInfo. From there you can get information such as starting time, end time and number of rows processed in each link.

Another option is using the dsjob command to query the logs. But again , the information is buried there so you have to dig

)

The only tool I know that can give you such functionality is Metastage.

HTH,
Amos

mhester · Post by **mhester** » Wed Oct 15, 2003 8:49 am

Metstage would only be useful for this if the Process metabroker is installed.

Regards,

Michael

ray.wurlod · Post by **ray.wurlod** » Wed Oct 15, 2003 5:42 pm

A DIY approach is to create a server job that consists purely of job control code that interrogates either the job, link and stage properties or the log file in order to produce the requisite report. This is a fairly straightforward DataStage BASIC programming task.

On Ascential's Programming with DataStage BASIC class (available on demand) you are shown how to read the job log file from the repository.

Amos.Rosmarin · Post by **Amos.Rosmarin** » Thu Oct 16, 2003 7:02 am

Another idea is to create an 'After job routine' that does what Ray suggested .... again some ds-basic programming + some overhead on the job.

It makes the task much easier if you have a good maning convention, so when you query the links and stages you can identify the different objects.

pkothana · Post by **pkothana** » Fri Oct 17, 2003 1:44 am

Thanks a lot for your information.

Is there any simple way to get these results for ex. to store the counts in some variables (i don't know where) and then in an after job subroutine we can write a shell script providing these values?

Regards
Pinkesh

pkothana · Post by **pkothana** » Fri Oct 17, 2003 2:07 am

Thanks Amos.
Appreciate if you please tell me how to use these methods.
Atually I am new to this Data Stage tool.

Again thanks a lot for your time.

Regards
Pinkesh

Amos.Rosmarin wrote:Pinkesh ,

There is no built-in mechanism for such statistics in Datastage but there are methods that you can use to write it yourself such as DSGetJobInfo and DSGetLinkInfo. From there you can get information such as starting time, end time and number of rows processed in each link.

Another option is using the dsjob command to query the logs. But again , the information is buried there so you have to dig )

The only tool I know that can give you such functionality is Metastage.

HTH,
Amos

Teej · Post by **Teej** » Fri Oct 17, 2003 1:44 pm

pkothana wrote:Is there any simple way to get these results for ex. to store the counts in some variables (i don't know where) and then in an after job subroutine we can write a shell script providing these values?

Why do that? Just create a Buildop stage, and do all the counting in C++, and spit out the new data to a separate flat file.

Instead of rejecting the records by dropping it, use the reject link, or spit out a record with a dummy value to this buildop stage.

Utilizing the buildop stage open up a lot of possibilities for you that require some careful design to optimize the flow. But this is a complete Parallel solution that you're seeking. (Of course, the buildop stage output will have to be aggregated if you don't care for data per node).

-T.J.

DSXchange