I am posting in this thread because it totally relates to the original poster's topic.
I am using auto partitioning. The logic I have is:
InitialValue for stage variable svOne:
@PARTITIONNUM-(@NUMPARTITIONS-1)
svOne derivation: svOne + @NUMPARTITIONS
svOutputRow=svOne
This works perfectly every time no matter how many nodes I have in the config file. I tried with files of over 10000 rows and it works fine as well.
My problem is, I am not sure I understand the logic. I tried to use peek stage to get system variable values out. For example, for svOutputRow of 14, I have these values: NUMPARTITIONS=4, PARTITIONNUM=1, InitialValue of svOne: -2
The final value should be: -2 + 4 = 2 unless InitialValue is being calculated only once per node and only "svOne + @NUMPARTITIONS" is being executed in which case the values would come out right.
Any thoughts?
Understanding Row Numbering algorithm
Moderators: chulett, rschirm, roy
This works perfectly but according to Ray's post in the linked topic it should not. Here is Ray's excerpt:
"Unless you can guarantee absolutely even distribution you will always see holes in the sequence. The only way that you can guarantee absolutely even distribution is (a) to specify Round Robin as the partitioning algorithm and (b) to have a number of rows that is an exact multiple of the number of partitions."
I am not doing either: (a) I am doing auto partitioning (b) My number of rows is not an exact multiple of number of partitions.
"Unless you can guarantee absolutely even distribution you will always see holes in the sequence. The only way that you can guarantee absolutely even distribution is (a) to specify Round Robin as the partitioning algorithm and (b) to have a number of rows that is an exact multiple of the number of partitions."
I am not doing either: (a) I am doing auto partitioning (b) My number of rows is not an exact multiple of number of partitions.
Are you saying it is working perfectly (congratulations) and you are perhaps trying to break it and wondering why it won't break?
"Auto" is not its own separate type of partitioning; it will automatically choose a type of partitioning. Maybe it chose round robin in your job. Have you checked to see what it automatically chose?
If you ran the job on 10000 rows and 4 nodes, that is a multiple of 4. If you ran the job on 9999 rows or 9997 rows, there is still a chance the sequence would be in good order with no gaps.
"Auto" is not its own separate type of partitioning; it will automatically choose a type of partitioning. Maybe it chose round robin in your job. Have you checked to see what it automatically chose?
If you ran the job on 10000 rows and 4 nodes, that is a multiple of 4. If you ran the job on 9999 rows or 9997 rows, there is still a chance the sequence would be in good order with no gaps.
Choose a job you love, and you will never have to work a day in your life. - Confucius
I know that Auto can pick Round Robin but it is working every single time for years. I have had plenty of odd number of rows situations and it works all the time.
I am not trying to break it. I am just trying to understand why it is working when the number of rows is not an exact multiple of number of partitions.
I am not trying to break it. I am just trying to understand why it is working when the number of rows is not an exact multiple of number of partitions.