How to count rows extracted from MS Sql table

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
cbres00
Participant
Posts: 34
Joined: Tue Sep 21, 2004 9:20 am

How to count rows extracted from MS Sql table

Post by cbres00 »

I'm doing a full extract of all rows from a MS SQL table and then building a flat file from it. I want to delete all of the rows from the table when I'm confident that the number of rows in the flat file are equal to the number of rows in the table.

What would I do to get a row count of the file I created?

Regards,
CB
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

DataStage keeps a count of the rows sent to the file, which you can subsequently interrogate using DSGetLinkInfo() or DSGetLogEntry() functions. Or you can use DSExecute to invoke the UNIX command wc -l filename

If you prefer to use a job sequence, you can use a Routine Activity for the former and a Command Activity for the latter.

Or you could use a shell script that counts the rows in the flat file, compares that to the result from SELECT COUNT(*) FROM tablename; and proceeds on that basis. The shell script could be invoked from DataStage if that's what you'd like to do.

There are other solutions. These will get you started.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
cbres00
Participant
Posts: 34
Joined: Tue Sep 21, 2004 9:20 am

Post by cbres00 »

Thanks for your reply.

Would I put the DSGetLinkInfo() in the transformer step that's connected to my MS SQL (dynamic RDBMS) stage?

Thanks,
CB

ray.wurlod wrote:DataStage keeps a count of the rows sent to the file, which you can subsequently interrogate using DSGetLinkInfo() or DSGetLogEntry() functions. Or you can use DSExecute to invoke the UNIX command wc -l filename

If you prefer to use a job sequence, you can use a Routine Activity for the former and a Command Activity for the latter.

Or you could use a shell script that counts the rows in the flat file, compares that to the result from SELECT COUNT(*) FROM tablename; and proceeds on that basis. The shell script could be invoked from DataStage if that's what you'd like to do.

There are other solutions. These will get you started.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The word "subsequently" in my earlier post was intentional. You can't determine the final number of rows (lines in a file) until it's completely loaded.

This means that you would not use DSGetLinkInfo() in a Transformer stage. Two reasons; the current count increments for every row processed, and it's available more cheaply using the system variable @OUTROWNUM.

What you probably need to do is capture the number of rows (lines) from both places in a job sequence, and make the decision about whether to proceed in that job sequence. Conditionally start the job activity that loads the table.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Post by vigneshra »

Just to add on to what Ray had said, you can use post job subroutine in which you can count the number of records in the file. If it is equal to the number of rows got from the table (can be drawn from link info) you can issue a truncate command to the table either from the routine itself (for which you need licensed ODBC database driver) or you can design a job that truncates the table which can be called conditionally from the post job subroutine itself. Ray, did this make sense ?? :roll:

Vignesh.
prabu
Participant
Posts: 146
Joined: Fri Oct 22, 2004 9:12 am

Re: How to count rows extracted from MS Sql table

Post by prabu »

cbres00 wrote: I want to delete all of the rows from the table when I'm confident that the number of rows in the flat file are equal to the number of rows in the table.
Regards,
CB
count is not a <b>reliable</b> way of checking if the operation is
successful . you have to have something like this to be safe :D


1.set Job Warning to 0
2.NOT (SeqFileLink.REJECTED) AND RaiseDSTransformError_subroutine
shaimil
Charter Member
Charter Member
Posts: 37
Joined: Fri Feb 28, 2003 5:37 am
Location: UK

Post by shaimil »

I'm trying to use the DSLinkInfo to obtain a row count. I keep getting a value of -1.

Any ideas why this may be.

I'm calling the command from a sequencer after the job in question has run.

Thanks
Shay
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Post your code if you want it debugged. Check the spelling of your link name (case sensitive).
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

It is a good practise to Do a select coun(*) from loaded_table where "Give a condition which meet the last load".... And then compare it with the unix file.
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

shaimil wrote:I'm calling the command from a sequencer after the job in question has run.
As Vince noted, we'd need to see your code. Everything would need to be spelled and 'cased' correctly for it to work. Get any of the parameters wrong and you'll get one of a variety of negative return codes.

I have a generic routine that a sequencer can call to get a link row count, it needs to know everything the function call needs in order to do this - job name, stage name and link name to check. You need to attach to the job in question and then issue the call to DSGetLinkInfo. Don't forget to detach from the job when you are done. Pass back the row count as the Answer from the routine. I'm assuming you are trying to do something similar.

I disagree with the notion that it's a 'good practice' to play the 'select count(*)' game. Perhaps if you are trying to get a count of what you should have extracted (as opposed to what you actually did extract) - but you can't always repeat the same sql and get the same answer. If you can then fine, but if the goal is to find out what you really extracted during a given process, then the method discussed here is the better practice. IMHO.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Look in JOBCONTROL.H (in the dsinclude directory) to learn that -1 means "invalid job handle".

You have not properly specified the first argument (the job handle) for DSGetLinkInfo.

Or you have used DSJ.ME as the first argument for DSGetLinkInfo in a routine but not loaded the contents of the JOBCONTROL.H header file.

You do this with the declaration $INCLUDE DSINCLUDE JOBCONTROL.H, but better (to avoid double declaration of constants) is:

Code: Select all

$IFNDEF JOBCONTROL.H
$INCLUDE DSINCLUDE JOBCONTROL.H
$ENDIF
So, where - exactly - are you trying to use DSGetLinkInfo ?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply