Count(*)
Moderators: chulett, rschirm, roy
Count(*)
I have a requirement where i am creating counts in a delimited file, something like below
Source
col1
Yes
Yes
No
No
Target(File)(based on three condiitions YES/NO/NA)
2|2|0
the reason is even if there is no value for any of the 3 conditions i need zero.
i am using TFM splitiing by using constraints and then using Agg stage for each link and then i am using link collector and seperating out in a file.
(using stage variables to generate the format based on some dummy key)
all i am getting is 2|2
how do i get zero in place of nothing?
Source
col1
Yes
Yes
No
No
Target(File)(based on three condiitions YES/NO/NA)
2|2|0
the reason is even if there is no value for any of the 3 conditions i need zero.
i am using TFM splitiing by using constraints and then using Agg stage for each link and then i am using link collector and seperating out in a file.
(using stage variables to generate the format based on some dummy key)
all i am getting is 2|2
how do i get zero in place of nothing?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I could give a parallel job answer, since you posted it in the parallel jobs forum. However your mention of a Link Collector suggests server job, and this is how you marked the job type.
Seems to me that this could be solved with three links each with its own counting Aggregator stage. One link handles Y, one link handles N, and the third link handles "neither Y nor N". Carry the same constant on each link, and join the three links together using a Transformer stage joining on the constant.
Seems to me that this could be solved with three links each with its own counting Aggregator stage. One link handles Y, one link handles N, and the third link handles "neither Y nor N". Carry the same constant on each link, and join the three links together using a Transformer stage joining on the constant.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tried the same thing.
"join the three links together using a Transformer stage joining on the constant"
but looks like there is timing issue.
the three links which are doing counts takes time but before that my trasnformer main stream link then looks up hashfile created or still getting created will not find any records in hash files though there are records which appears late untill then the output will just show 0 for all the counts.
i hope i am clear in explaining the scenario here
"join the three links together using a Transformer stage joining on the constant"
but looks like there is timing issue.
the three links which are doing counts takes time but before that my trasnformer main stream link then looks up hashfile created or still getting created will not find any records in hash files though there are records which appears late untill then the output will just show 0 for all the counts.
i hope i am clear in explaining the scenario here
not sure how to draw a picture...checked FAQ section didnt find any...!
Code: Select all
AGG----->TFM4---->Hash1
^
i
i
i
i
seq--->TFM1-->TFM2--------->TFM3------->Hash2
^
|
|
|
|
Hash1
-
- Participant
- Posts: 246
- Joined: Mon Jun 30, 2008 3:22 am
- Location: New York
- Contact:
If my understanding is correct the below approach will work.
SeqFile ----> Transformer ----> HashedFile ---->SeqFile
In the transformer, create 3 stage variables:
sv1 - If InputLink.Col1 = 'YES' Then sv1+1 Else sv1
sv2 - If InputLink.Col1 = 'NO' Then sv2+1 Else sv2
sv3 - If InputLink.Col1 = 'NA' Then sv3+1 Else sv3
And give default value for the stage variables as zero. Make a dummy key column in the hashed file and write all these stage variables to the hashed file(So as the Hashed file is updated for every record with the counts). Read the same hashed file and write it to your target SeqFile.
SeqFile ----> Transformer ----> HashedFile ---->SeqFile
In the transformer, create 3 stage variables:
sv1 - If InputLink.Col1 = 'YES' Then sv1+1 Else sv1
sv2 - If InputLink.Col1 = 'NO' Then sv2+1 Else sv2
sv3 - If InputLink.Col1 = 'NA' Then sv3+1 Else sv3
And give default value for the stage variables as zero. Make a dummy key column in the hashed file and write all these stage variables to the hashed file(So as the Hashed file is updated for every record with the counts). Read the same hashed file and write it to your target SeqFile.
Arun
Why not set another column thats a 1 for YES/NO and 0 for NA. Then send it down to the aggregator stage and group on the type and sum up this new column.
Then just pivot it.
Am I missing something from the requirement?
Then just pivot it.
Am I missing something from the requirement?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 246
- Joined: Mon Jun 30, 2008 3:22 am
- Location: New York
- Contact:
I did not understand this. Can you give some example? And if you are talking abouut the Key column in the Hashed file, it should be a dummy key, say 1 for all the rows.Nagaraj wrote:Yes this logic works for only one set of data,
it should be able to handle different group types(i.e the key column keeps changing)
Arun