Reset Counter based on Types

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sohasaid
Premium Member
Premium Member
Posts: 115
Joined: Tue May 20, 2008 3:02 am
Location: Cairo, Egypt

Reset Counter based on Types

Post by sohasaid »

I had a requirement to create a counter and it should be reset at every different type, as follows:

Type, Counter
A, 1
A, 2
A, 3
B, 1
B, 2
C, 1
C, 2

I found some difficulties at the beginning because I need to keep the parallel execution mode and job runs on 12 nodes.

But the case has solved once I just defined an auto increment stage variable 'StgVar' with default value '0' with derivation 'StgVar+1'.

What I don't understand and need your help to explain is how it has worked without specifying any other logic and how the counter got reset after each type?!

Job Design:
DataSet --> transformer --> DataSet
Notes: Input data is sorted based on type and all jobs have 'parallel' execution modes.

I've attached the dataset part of the job score:

Code: Select all

main_program: This step has 3 datasets:
ds0: {/tmp/test1.ds
      eAny=>eCollectAny
      op2[12p] (parallel APT_CombinedOperatorController:Data_Set_26)}
ds1: {op0[12p] (parallel delete data files in delete /tmp/teeeta.ds)
      >>eCollectAny
      op1[1p] (sequential delete descriptor file in delete /tmp/teeeta.ds)}
ds2: {op2[12p] (parallel APT_CombinedOperatorController:Data_Set_29)
      [pp] =>
      /tmp/teeeta.ds}
Regards.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Contrary to what you believe , it didn't work. Try a sufficiently large enough test dataset.

Mike
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Mike ,

I think , The source ( input dataset ) is already partitioned & sorted on the key column...So it is generating sequence number based on the type correctly .
Nag
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Try it with 13 types or run it on a 2-node configuration and it'll be quite obvious what the problem is...

Mike
sohasaid
Premium Member
Premium Member
Posts: 115
Joined: Tue May 20, 2008 3:02 am
Location: Cairo, Egypt

Post by sohasaid »

Mike wrote:Try a sufficiently large enough test dataset.
Thank you Mike and Nag for reply.

You're right, Mike. I've tried with 1 million records into database table and it didn't work.

Now how do you think I could achieve the requirement? (i.e. reset a counter at every new type with keeping the parallel execution mode?!)
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

This is a very common design pattern and one that has been discussed a whole lot on DSXchange.

One option is to use stage variables to detect a key change and reset a counter.

Another option is to use the sort stage to add a key change indicator to the row and use that to reset a counter.

And of course, since you are performing a key-based operation, you must ensure that your data are partitioned and sorted by that key.

Mike
sohasaid
Premium Member
Premium Member
Posts: 115
Joined: Tue May 20, 2008 3:02 am
Location: Cairo, Egypt

Post by sohasaid »

Mike wrote:Another option is to use the sort stage to add a key change indicator to the row and use that to reset a counter.
Thanks Mike. I've used this approach and it's worked .

1.Sort and partition input data based on type column
2. Generate keychange column from the sort stage.
3. Create an integer stage variable with '0' default value using this derivation:
If DSLink35.keyChange = 0 Then StageVar + 1 Else 1

Thanks again. :)
Post Reply