Page 1 of 2

Alternative way

Posted: Sun Aug 14, 2011 10:53 am
by prasad v
Hi,

I am generating Sequence Numbers using PartitionNum,InRowNum and NumPartition System Variables. But it is giving gaps in few jobs and it is proper in few jobs. This is Because of unbalanced no. of records in partitions.

If we partition the data based on Hash method prior to the Round Robin method in Transformer. It is giving wrong sequence number.

Is there any other way to get the sequence number without gaps. I dont want to go for Sequence in transformer

Thanks
Prasad

Posted: Sun Aug 14, 2011 2:03 pm
by chulett
Let's get this out of the way first. Surrogate values have no derived meaning or intelligence behind them, they just 'take the place' of the business key. Because of that, having 'gaps' in the sequence is meaningless and not something you should be worrying about.

Posted: Mon Aug 15, 2011 2:48 am
by soumya5891
Use this formula.
@INROWNUM*@NUMPARTITION + @PARTITIONNUM - @NUMPARTITION +max_val_of_table
Hope this thing can work

Posted: Mon Aug 15, 2011 3:50 am
by prasad v
Use this formula.
@INROWNUM*@NUMPARTITION + @PARTITIONNUM - @NUMPARTITION +max_val_of_table
Hope this thing can work
It doesn't work, as if we have 3 rows adn 3 nodes, two record in 1st node and 1 record in second node. Forget about max_val_of_table. we can add it later once we get the sequence numbers properly.
Output:
1
2
4
Let's get this out of the way first. Surrogate values have no derived meaning or intelligence behind them, they just 'take the place' of the business key. Because of that, having 'gaps' in the sequence is meaningless and not something you should be worrying about.
As per my concern, this should be sequence number as i am maintaing max sequence number in a file. if we have gaps in seq.numbers. We will get duplicates sequene numbers second run

Posted: Mon Aug 15, 2011 5:53 am
by ray.wurlod
The only way to get a guaranteed unbroken sequence is to generate it using a stage executing a formula in sequential mode or to use a generator such as the Surrogate Key Generator stage.

Posted: Mon Aug 15, 2011 6:22 am
by prasad v
Thanks Ray,

Do you have any idea, if we can manager it through any routines(C++/Basic), funciton which can be used in column expression?

Posted: Mon Aug 15, 2011 6:24 am
by chulett
prasad v wrote:As per my concern, this should be sequence number as i am maintaing max sequence number in a file. if we have gaps in seq.numbers. We will get duplicates sequene numbers second run
:? I don't see how you get there from here. Gaps = duplicates? No.

Posted: Mon Aug 15, 2011 3:14 pm
by ray.wurlod
@OUTROWNUM will do it if, as I already advised, the Transformer stage is executing in sequential mode.

Posted: Tue Aug 16, 2011 10:53 am
by soumya5891
Use this formula.
@INROWNUM*@NUMPARTITION + @PARTITIONNUM - @NUMPARTITION +1 +max_val_of_table and do the round robin partitioning.Make the whole process in a transformer stage
Hope this thing can work

Posted: Tue Aug 16, 2011 11:32 am
by prasad v
Thanks Everyone
I don't see how you get there from here. Gaps = duplicates? No.
I dont get any duplicates, Eg;

We have 4 records, 4 Nodes and prior to the Transformer(where we generate Sequence number) 2 record in 1 st Node, 1 record in 2nd Node, 1 record in 3rd Node and No record in 4th Node.

When we apply Round Robin in Transformer 3 record in 1st Node and 1 record in 2nd node. no record in 3rd and 4th nodes.

So, the Sequence numbers are : 1,2,5,9.

Use this formula.
@INROWNUM*@NUMPARTITION + @PARTITIONNUM - @NUMPARTITION +1 +max_val_of_table and do the round robin partitioning.Make the whole process in a transformer stage
Hope this thing can work
This code works only when all Partitions are balanced

Posted: Tue Aug 16, 2011 11:56 am
by soumya5891
In the Transformer where you generate the surrogate key use the round robin partition in that very transformer only.Then the generated surrogate key will be in sequence

Posted: Tue Aug 16, 2011 12:10 pm
by prasad v
soumya5891 wrote:In the Transformer where you generate the surrogate key use the round robin partition in that very transformer only.Then the generated surrogate key will be in sequence
If you see my example above, Transformer(where we generate Sequence Number) we need to make all partitions balanced.

Not sure whether Modulus works?

Posted: Tue Aug 16, 2011 12:43 pm
by chulett
prasad v wrote:
I don't see how you get there from here. Gaps = duplicates? No.
I dont get any duplicates, Eg;
You conviently left out part of what I quoted:
chulett wrote:
prasad v wrote:As per my concern, this should be sequence number as i am maintaing max sequence number in a file. if we have gaps in seq.numbers. We will get duplicates sequene numbers second run
:? I don't see how you get there from here. Gaps = duplicates? No.
I was responding to your concerns that if you had gaps in the sequence that those gaps would somehow lead to duplicates being generated in subsequent runs, which is... wrong.

Posted: Tue Aug 16, 2011 1:08 pm
by prasad v
I was responding to your concerns that if you had gaps in the sequence that those gaps would somehow lead to duplicates being generated in subsequent runs, which is... wrong.
We do update one table with latest Seq Number+no.of records in the present run.

When we generate seqnumber, we add latest Seq Number+Sequence numbers.

As i said earlier post, Eg: This is the first run, sequence number will be updated with 0+4=4, but the generated sequence number are 1,2,5,9.

In the Second run, Sequence number will be updated with 4+4(if we have again 4 records

generate sequence numbers are 5,6,9,13...

Posted: Tue Aug 16, 2011 1:13 pm
by jwiles
You have gaps in the example output you give. Whether or not you will have gaps when running in parallel (2 or more partitions) is based upon the mathematics of your sequence assignment, number of rows and the distribution of your data across partitions (in other words, everything has to be just right). None of the partitioning methods can guarantee an absolutely even distribution across multiple partitions. The only way to absolutely guarantee no gaps when using the transformer in this way (your formula) is to run the transformer in a single partition (sequential mode). Two partitions would come close but you could still potentially end up with a gap.

Modulus partitioning against a row number value (@INROWNUM from a transformer, for instance) may give results similar to but likely no better than Round Robin.

Don't worry about gaps unless there's some sort of business rule that requires no gaps. Ensure that your sequence number column is large enough for future needs.

Regards,