Count(*)

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Count(*)

Post by Nagaraj »

I have a requirement where i am creating counts in a delimited file, something like below

Source

col1
Yes
Yes
No
No


Target(File)(based on three condiitions YES/NO/NA)

2|2|0

the reason is even if there is no value for any of the 3 conditions i need zero.

i am using TFM splitiing by using constraints and then using Agg stage for each link and then i am using link collector and seperating out in a file.
(using stage variables to generate the format based on some dummy key)

all i am getting is 2|2

how do i get zero in place of nothing?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I could give a parallel job answer, since you posted it in the parallel jobs forum. However your mention of a Link Collector suggests server job, and this is how you marked the job type.

Seems to me that this could be solved with three links each with its own counting Aggregator stage. One link handles Y, one link handles N, and the third link handles "neither Y nor N". Carry the same constant on each link, and join the three links together using a Transformer stage joining on the constant.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

... and here we are in Server world now.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

oh right...i just didnt notice :) :) :)
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

tried the same thing.

"join the three links together using a Transformer stage joining on the constant"

but looks like there is timing issue.

the three links which are doing counts takes time but before that my trasnformer main stream link then looks up hashfile created or still getting created will not find any records in hash files though there are records which appears late untill then the output will just show 0 for all the counts.

i hope i am clear in explaining the scenario here
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Nope, not clear at all.

Can you draw a 'picture' of the job?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

not sure how to draw a picture...checked FAQ section didnt find any...!

Code: Select all

               AGG----->TFM4---->Hash1    
                ^           
                i           
                i           
                i           
                i           
seq--->TFM1-->TFM2--------->TFM3------->Hash2
	  				            ^ 
                             | 
                             | 
                             |
                             |
                            Hash1
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

the upper link will be coming from 2nd TFM, and the lower reference link is towards 3rd TFM.

:)

Moderator: like that. The need to use code tags to preserve whitespace has been repeated here a ton of times. I also numbered your transformers for easier reference.
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

i can split the job into two, which will definately work, but then i want to use this functionality where i can write and read from the same hash file.
arunkumarmm
Participant
Posts: 246
Joined: Mon Jun 30, 2008 3:22 am
Location: New York
Contact:

Post by arunkumarmm »

If my understanding is correct the below approach will work.

SeqFile ----> Transformer ----> HashedFile ---->SeqFile


In the transformer, create 3 stage variables:

sv1 - If InputLink.Col1 = 'YES' Then sv1+1 Else sv1
sv2 - If InputLink.Col1 = 'NO' Then sv2+1 Else sv2
sv3 - If InputLink.Col1 = 'NA' Then sv3+1 Else sv3

And give default value for the stage variables as zero. Make a dummy key column in the hashed file and write all these stage variables to the hashed file(So as the Hashed file is updated for every record with the counts). Read the same hashed file and write it to your target SeqFile.
Arun
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

Yes this logic works for only one set of data,
it should be able to handle different group types(i.e the key column keeps changing)
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Why not set another column thats a 1 for YES/NO and 0 for NA. Then send it down to the aggregator stage and group on the type and sum up this new column.
Then just pivot it.
Am I missing something from the requirement?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

sorted by key columns first and then
Tried by adding one more stage variable checking for key column change,
didnt workout.
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

Okay folks,

I am splitting the job into two and referencing the hash files in 2nd job which will give me counts and as well as zeros.....thanks for your time and ideas, appreciate it.
arunkumarmm
Participant
Posts: 246
Joined: Mon Jun 30, 2008 3:22 am
Location: New York
Contact:

Post by arunkumarmm »

Nagaraj wrote:Yes this logic works for only one set of data,
it should be able to handle different group types(i.e the key column keeps changing)
I did not understand this. Can you give some example? And if you are talking abouut the Key column in the Hashed file, it should be a dummy key, say 1 for all the rows.
Arun
Post Reply