Page 1 of 1

Count of Records

Posted: Wed Dec 05, 2007 11:03 pm
by devidotcom
Hi,

I need the count of records in a paralled job. I separate the data based on the conditions (constraint) from the transformer stage and send them to the aggregator stage to count the records.

I had a quick question here do I need to sort the data based on the key column before sending it to the target as I am using a parallel DS job or would the job do it correctly without the sort stage.

Thanks in advance

Posted: Wed Dec 05, 2007 11:21 pm
by us1aslam1us
It is always good practice to sort the data before performing any aggregation. In your case if you just want to count the records processed or filtered, I suggest you better look for the DSGetLinkInfo().

Posted: Thu Dec 06, 2007 12:11 am
by devidotcom
Thanks for the reply.
But i do not relie on the link count because the link count behave weird at times and give some different count I don't know why.

Posted: Thu Dec 06, 2007 6:32 pm
by siddesai
devidotcom wrote:Thanks for the reply.
But i do not relie on the link count because the link count behave weird at times and give some different count I don't know why.
Does anyone know why the link gives weird numbers? Is that a bug?

Posted: Thu Dec 06, 2007 7:55 pm
by ray.wurlod
We have only your assertion that the numbers are "weird". What do you mean by this term? Where's the proof?

Have you tried setting the environment variable that causes player processes to report their row counts?

Posted: Thu Feb 07, 2008 11:13 am
by abc123
For getting a count, you don't need to sort. It doesn't matter how many nodes you have, an aggregator will do. It is probably the simplest of aggregator functions.

Posted: Fri Feb 08, 2008 12:17 am
by Maveric
If you want the total record count then you don't have to sort. If you want the count based on a key field then sort and hash partition on the key.