Sequential file

kittu.raja · Post by **kittu.raja** » Tue Dec 30, 2008 11:11 am

Hi,

I have a flat file having 2 columns. It look like this
Col1 Col2
1 adam
1 adam
2 adam
3 michael
3 michael

I want to find out the unique count of col1.

Can anybody help me out in doing that.

Thanks,

Nagaraj · Post by **Nagaraj** » Tue Dec 30, 2008 11:39 am

You can use a unix command something like this(since the file is on unix)

nawk -F'|' '!x[$1]++' chck.txt |wc -l

I hope this is what you are looking for.

metadata1 · Post by **metadata1** » Tue Dec 30, 2008 12:00 pm

Within a job you could apply Aggregator Stage to group/count records - Have you thought about using that option?

Not sure what your exact requirements are -

nsm · Post by **nsm** » Tue Dec 30, 2008 12:22 pm

simply use:- sort -n -u test|wc -l
and do result-1 as your file ist line is column names.

kittu.raja · Post by **kittu.raja** » Tue Dec 30, 2008 12:49 pm

metadata1 wrote:Within a job you could apply Aggregator Stage to group/count records - Have you thought about using that option?

Not sure what your exact requirements are -

I have used it but I am getting counts of each group. I want all the distict count of the second column.

kittu.raja · Post by **kittu.raja** » Tue Dec 30, 2008 12:52 pm

nsm wrote:simply use:- sort -n -u test|wc -l
and do result-1 as your file ist line is column names.

I want only the distinct count of second column. Where are you specifying the second column name?

kittu.raja · Post by **kittu.raja** » Tue Dec 30, 2008 12:53 pm

Nagaraj wrote:You can use a unix command something like this(since the file is on unix)

nawk -F'|' '!x[$1]++' chck.txt |wc -l

I hope this is what you are looking for.

Where are you specifying the column name.

Nagaraj · Post by **Nagaraj** » Tue Dec 30, 2008 6:28 pm

$1 is the first field and $2 is the second......so on ....

dr.murthy · Post by **dr.murthy** » Tue Dec 30, 2008 10:01 pm

I

have used it but I am getting counts of each group. I want all the distict count of the second column.

[/quote]

tell me how would be the your output result,means you need the distinct count of second col or frist col

Nagaraj · Post by **Nagaraj** » Tue Dec 30, 2008 10:03 pm

output is just a number, why dont you try the commad which i have given in UNIX?

kishore2456 · Post by **kishore2456** » Wed Dec 31, 2008 12:03 am

You can use aggregator, where just use aggregation and count on the same column (either first or second which you want).

kishore2456 · Post by **kishore2456** » Wed Dec 31, 2008 12:10 am

You can use aggregator, where just use aggregation and count on the same column (either first or second which you want).

Nagaraj · Post by **Nagaraj** » Wed Dec 31, 2008 6:59 am

kishore2456 wrote:You can use aggregator, where just use aggregation and count on the same column (either first or second which you want).

I think he is right, you can do this way as well....