Sequential file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kittu.raja
Premium Member
Premium Member
Posts: 175
Joined: Tue Oct 14, 2008 1:48 pm

Sequential file

Post by kittu.raja »

Hi,

I have a flat file having 2 columns. It look like this
Col1 Col2
1 adam
1 adam
2 adam
3 michael
3 michael

I want to find out the unique count of col1.

Can anybody help me out in doing that.

Thanks,
Rajesh Kumar
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

You can use a unix command something like this(since the file is on unix)

nawk -F'|' '!x[$1]++' chck.txt |wc -l

I hope this is what you are looking for.
metadata1
Participant
Posts: 10
Joined: Thu Dec 04, 2008 5:50 pm

Post by metadata1 »

Within a job you could apply Aggregator Stage to group/count records - Have you thought about using that option?

Not sure what your exact requirements are -
nsm
Premium Member
Premium Member
Posts: 139
Joined: Mon Feb 09, 2004 8:58 am

Post by nsm »

simply use:- sort -n -u test|wc -l
and do result-1 as your file ist line is column names.
kittu.raja
Premium Member
Premium Member
Posts: 175
Joined: Tue Oct 14, 2008 1:48 pm

Post by kittu.raja »

metadata1 wrote:Within a job you could apply Aggregator Stage to group/count records - Have you thought about using that option?

Not sure what your exact requirements are -
I have used it but I am getting counts of each group. I want all the distict count of the second column.
Rajesh Kumar
kittu.raja
Premium Member
Premium Member
Posts: 175
Joined: Tue Oct 14, 2008 1:48 pm

Post by kittu.raja »

nsm wrote:simply use:- sort -n -u test|wc -l
and do result-1 as your file ist line is column names.
I want only the distinct count of second column. Where are you specifying the second column name?
Rajesh Kumar
kittu.raja
Premium Member
Premium Member
Posts: 175
Joined: Tue Oct 14, 2008 1:48 pm

Post by kittu.raja »

Nagaraj wrote:You can use a unix command something like this(since the file is on unix)

nawk -F'|' '!x[$1]++' chck.txt |wc -l

I hope this is what you are looking for.
Where are you specifying the column name.
Rajesh Kumar
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

$1 is the first field and $2 is the second......so on ....
dr.murthy
Participant
Posts: 224
Joined: Sun Dec 07, 2008 8:47 am
Location: delhi

Post by dr.murthy »

I
have used it but I am getting counts of each group. I want all the distict count of the second column.
[/quote]

tell me how would be the your output result,means you need the distinct count of second col or frist col
D.N .MURTHY
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

output is just a number, why dont you try the commad which i have given in UNIX?
kishore2456
Participant
Posts: 47
Joined: Mon May 07, 2007 10:35 pm

Post by kishore2456 »

You can use aggregator, where just use aggregation and count on the same column (either first or second which you want).
FD
kishore2456
Participant
Posts: 47
Joined: Mon May 07, 2007 10:35 pm

Post by kishore2456 »

You can use aggregator, where just use aggregation and count on the same column (either first or second which you want).
FD
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

kishore2456 wrote:You can use aggregator, where just use aggregation and count on the same column (either first or second which you want).
I think he is right, you can do this way as well....
Post Reply