Generating Sequence numbers in Parallel Transormer

ray.wurlod · Post by **ray.wurlod** » Tue Nov 20, 2007 3:26 pm

srimitta wrote:And one more thing is Data in DataSet not in Sequence order it starts from 1 and after 96 again row starts from 3704 and ends with 3799 and sratrs from 97 ends with 192.

Sequence no changes app after every 95 rows.

Any idea hwat's going-on and how to staight-up this.

Thanks
srimitta

All the numbers are there - what you are seeing is an artifact of how the numbers are blocked, and possibly of how you are sampling in View Data. It looks like you are getting approximately 96 rows per block (the actual number depends, of course, on the row size).

The values are not necessarily stored in sorted order in the Data Set - did you sort them on the way in?

Even if they are, as you retrieve rows from the different processing nodes it will appear that there are huge jumps. Depending on how your data are partitioned will also affect how big these jumps seem to be. For example with Round Robin partitioning and two nodes you will tend to get even numbers on one node and odd numbers on the other.

I exhort you to experiment further with the variations on this theme, and try to understand what's happening and what's going where.

rwierdsm · Post by **rwierdsm** » Fri Nov 23, 2007 10:42 am

We tried a number of different options in parallel, but none could give us an acceptable result.

In the end, we created a sequetial process to generate our surrogate keys.

Rob

dohertys · Post by **dohertys** » Fri Nov 23, 2007 10:55 am

The way I found recommended in the documentation was to use a Surrogate Key Generator stage, with 'Execution Mode' set to sequential
and 'Collector Type' to round robin.

Any use?

boxtoby · Post by **boxtoby** » Fri Nov 23, 2007 11:23 am

I have used this derivation in the past for a surrogate key:

@PARTITIONNUM+1 : @INROWNUM

I wouldn't recommend it for a permenant key value, but it works well for temporary storage in a dataset.

srimitta · Post by **srimitta** » Fri Dec 07, 2007 10:52 am

We took workaround approach by forcing source and lookup DataSet's created on same node, now surrogate keys are in sequence.

Thanks
srimitta

srimitta · Post by **srimitta** » Fri Dec 07, 2007 10:59 am

We took workaround approach by forcing source and lookup DataSet's created on same node, now surrogate keys are in sequence in Parallel mode.

Thanks
srimitta