Alternative way

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

prasad v
Participant
Posts: 174
Joined: Mon Mar 30, 2009 2:18 am

Alternative way

Post by prasad v »

Hi,

I am generating Sequence Numbers using PartitionNum,InRowNum and NumPartition System Variables. But it is giving gaps in few jobs and it is proper in few jobs. This is Because of unbalanced no. of records in partitions.

If we partition the data based on Hash method prior to the Round Robin method in Transformer. It is giving wrong sequence number.

Is there any other way to get the sequence number without gaps. I dont want to go for Sequence in transformer

Thanks
Prasad
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Let's get this out of the way first. Surrogate values have no derived meaning or intelligence behind them, they just 'take the place' of the business key. Because of that, having 'gaps' in the sequence is meaningless and not something you should be worrying about.
-craig

"You can never have too many knives" -- Logan Nine Fingers
soumya5891
Participant
Posts: 152
Joined: Mon Mar 07, 2011 6:16 am

Post by soumya5891 »

Use this formula.
@INROWNUM*@NUMPARTITION + @PARTITIONNUM - @NUMPARTITION +max_val_of_table
Hope this thing can work
Soumya
prasad v
Participant
Posts: 174
Joined: Mon Mar 30, 2009 2:18 am

Post by prasad v »

Use this formula.
@INROWNUM*@NUMPARTITION + @PARTITIONNUM - @NUMPARTITION +max_val_of_table
Hope this thing can work
It doesn't work, as if we have 3 rows adn 3 nodes, two record in 1st node and 1 record in second node. Forget about max_val_of_table. we can add it later once we get the sequence numbers properly.
Output:
1
2
4
Let's get this out of the way first. Surrogate values have no derived meaning or intelligence behind them, they just 'take the place' of the business key. Because of that, having 'gaps' in the sequence is meaningless and not something you should be worrying about.
As per my concern, this should be sequence number as i am maintaing max sequence number in a file. if we have gaps in seq.numbers. We will get duplicates sequene numbers second run
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The only way to get a guaranteed unbroken sequence is to generate it using a stage executing a formula in sequential mode or to use a generator such as the Surrogate Key Generator stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
prasad v
Participant
Posts: 174
Joined: Mon Mar 30, 2009 2:18 am

Post by prasad v »

Thanks Ray,

Do you have any idea, if we can manager it through any routines(C++/Basic), funciton which can be used in column expression?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

prasad v wrote:As per my concern, this should be sequence number as i am maintaing max sequence number in a file. if we have gaps in seq.numbers. We will get duplicates sequene numbers second run
:? I don't see how you get there from here. Gaps = duplicates? No.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

@OUTROWNUM will do it if, as I already advised, the Transformer stage is executing in sequential mode.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
soumya5891
Participant
Posts: 152
Joined: Mon Mar 07, 2011 6:16 am

Post by soumya5891 »

Use this formula.
@INROWNUM*@NUMPARTITION + @PARTITIONNUM - @NUMPARTITION +1 +max_val_of_table and do the round robin partitioning.Make the whole process in a transformer stage
Hope this thing can work
Soumya
prasad v
Participant
Posts: 174
Joined: Mon Mar 30, 2009 2:18 am

Post by prasad v »

Thanks Everyone
I don't see how you get there from here. Gaps = duplicates? No.
I dont get any duplicates, Eg;

We have 4 records, 4 Nodes and prior to the Transformer(where we generate Sequence number) 2 record in 1 st Node, 1 record in 2nd Node, 1 record in 3rd Node and No record in 4th Node.

When we apply Round Robin in Transformer 3 record in 1st Node and 1 record in 2nd node. no record in 3rd and 4th nodes.

So, the Sequence numbers are : 1,2,5,9.

Use this formula.
@INROWNUM*@NUMPARTITION + @PARTITIONNUM - @NUMPARTITION +1 +max_val_of_table and do the round robin partitioning.Make the whole process in a transformer stage
Hope this thing can work
This code works only when all Partitions are balanced
soumya5891
Participant
Posts: 152
Joined: Mon Mar 07, 2011 6:16 am

Post by soumya5891 »

In the Transformer where you generate the surrogate key use the round robin partition in that very transformer only.Then the generated surrogate key will be in sequence
Soumya
prasad v
Participant
Posts: 174
Joined: Mon Mar 30, 2009 2:18 am

Post by prasad v »

soumya5891 wrote:In the Transformer where you generate the surrogate key use the round robin partition in that very transformer only.Then the generated surrogate key will be in sequence
If you see my example above, Transformer(where we generate Sequence Number) we need to make all partitions balanced.

Not sure whether Modulus works?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

prasad v wrote:
I don't see how you get there from here. Gaps = duplicates? No.
I dont get any duplicates, Eg;
You conviently left out part of what I quoted:
chulett wrote:
prasad v wrote:As per my concern, this should be sequence number as i am maintaing max sequence number in a file. if we have gaps in seq.numbers. We will get duplicates sequene numbers second run
:? I don't see how you get there from here. Gaps = duplicates? No.
I was responding to your concerns that if you had gaps in the sequence that those gaps would somehow lead to duplicates being generated in subsequent runs, which is... wrong.
-craig

"You can never have too many knives" -- Logan Nine Fingers
prasad v
Participant
Posts: 174
Joined: Mon Mar 30, 2009 2:18 am

Post by prasad v »

I was responding to your concerns that if you had gaps in the sequence that those gaps would somehow lead to duplicates being generated in subsequent runs, which is... wrong.
We do update one table with latest Seq Number+no.of records in the present run.

When we generate seqnumber, we add latest Seq Number+Sequence numbers.

As i said earlier post, Eg: This is the first run, sequence number will be updated with 0+4=4, but the generated sequence number are 1,2,5,9.

In the Second run, Sequence number will be updated with 4+4(if we have again 4 records

generate sequence numbers are 5,6,9,13...
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

You have gaps in the example output you give. Whether or not you will have gaps when running in parallel (2 or more partitions) is based upon the mathematics of your sequence assignment, number of rows and the distribution of your data across partitions (in other words, everything has to be just right). None of the partitioning methods can guarantee an absolutely even distribution across multiple partitions. The only way to absolutely guarantee no gaps when using the transformer in this way (your formula) is to run the transformer in a single partition (sequential mode). Two partitions would come close but you could still potentially end up with a gap.

Modulus partitioning against a row number value (@INROWNUM from a transformer, for instance) may give results similar to but likely no better than Round Robin.

Don't worry about gaps unless there's some sort of business rule that requires no gaps. Ensure that your sequence number column is large enough for future needs.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply