Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.
Moderators: chulett, rschirm, roy
-
kittu.raja
- Premium Member
![Premium Member Premium Member](./images/ranks/premium.gif)
- Posts: 175
- Joined: Tue Oct 14, 2008 1:48 pm
Post
by kittu.raja »
Hi,
I have a flat file having 2 columns. It look like this
Col1 Col2
1 adam
1 adam
2 adam
3 michael
3 michael
I want to find out the unique count of col1.
Can anybody help me out in doing that.
Thanks,
Rajesh Kumar
-
Nagaraj
- Premium Member
![Premium Member Premium Member](./images/ranks/premium.gif)
- Posts: 383
- Joined: Thu Nov 08, 2007 12:32 am
- Location: Bangalore
Post
by Nagaraj »
You can use a unix command something like this(since the file is on unix)
nawk -F'|' '!x[$1]++' chck.txt |wc -l
I hope this is what you are looking for.
-
metadata1
- Participant
- Posts: 10
- Joined: Thu Dec 04, 2008 5:50 pm
Post
by metadata1 »
Within a job you could apply Aggregator Stage to group/count records - Have you thought about using that option?
Not sure what your exact requirements are -
-
nsm
- Premium Member
![Premium Member Premium Member](./images/ranks/premium.gif)
- Posts: 139
- Joined: Mon Feb 09, 2004 8:58 am
Post
by nsm »
simply use:- sort -n -u test|wc -l
and do result-1 as your file ist line is column names.
-
kittu.raja
- Premium Member
![Premium Member Premium Member](./images/ranks/premium.gif)
- Posts: 175
- Joined: Tue Oct 14, 2008 1:48 pm
Post
by kittu.raja »
metadata1 wrote:Within a job you could apply Aggregator Stage to group/count records - Have you thought about using that option?
Not sure what your exact requirements are -
I have used it but I am getting counts of each group. I want all the distict count of the second column.
Rajesh Kumar
-
kittu.raja
- Premium Member
![Premium Member Premium Member](./images/ranks/premium.gif)
- Posts: 175
- Joined: Tue Oct 14, 2008 1:48 pm
Post
by kittu.raja »
nsm wrote:simply use:- sort -n -u test|wc -l
and do result-1 as your file ist line is column names.
I want only the distinct count of second column. Where are you specifying the second column name?
Rajesh Kumar
-
kittu.raja
- Premium Member
![Premium Member Premium Member](./images/ranks/premium.gif)
- Posts: 175
- Joined: Tue Oct 14, 2008 1:48 pm
Post
by kittu.raja »
Nagaraj wrote:You can use a unix command something like this(since the file is on unix)
nawk -F'|' '!x[$1]++' chck.txt |wc -l
I hope this is what you are looking for.
Where are you specifying the column name.
Rajesh Kumar
-
Nagaraj
- Premium Member
![Premium Member Premium Member](./images/ranks/premium.gif)
- Posts: 383
- Joined: Thu Nov 08, 2007 12:32 am
- Location: Bangalore
Post
by Nagaraj »
$1 is the first field and $2 is the second......so on ....
-
dr.murthy
- Participant
- Posts: 224
- Joined: Sun Dec 07, 2008 8:47 am
- Location: delhi
Post
by dr.murthy »
I
have used it but I am getting counts of each group. I want all the distict count of the second column.
[/quote]
tell me how would be the your output result,means you need the distinct count of second col or frist col
D.N .MURTHY
-
Nagaraj
- Premium Member
![Premium Member Premium Member](./images/ranks/premium.gif)
- Posts: 383
- Joined: Thu Nov 08, 2007 12:32 am
- Location: Bangalore
Post
by Nagaraj »
output is just a number, why dont you try the commad which i have given in UNIX?
-
kishore2456
- Participant
- Posts: 47
- Joined: Mon May 07, 2007 10:35 pm
Post
by kishore2456 »
You can use aggregator, where just use aggregation and count on the same column (either first or second which you want).
FD
-
kishore2456
- Participant
- Posts: 47
- Joined: Mon May 07, 2007 10:35 pm
Post
by kishore2456 »
You can use aggregator, where just use aggregation and count on the same column (either first or second which you want).
FD
-
Nagaraj
- Premium Member
![Premium Member Premium Member](./images/ranks/premium.gif)
- Posts: 383
- Joined: Thu Nov 08, 2007 12:32 am
- Location: Bangalore
Post
by Nagaraj »
kishore2456 wrote:You can use aggregator, where just use aggregation and count on the same column (either first or second which you want).
I think he is right, you can do this way as well....