Page 1 of 1

Issue with Random() function

Posted: Wed Feb 08, 2012 5:16 am
by mouthou
Hi All,

I have recently used Random() function to get random numbers generated from the Transformer. But it seems to be generating duplicate values, though I defined the Transformer to run in sequential mode. This is quite strange. Please let me know if anyone faced this or any workaround.

The problem with the duplicate values is that those values are going as the primary key into the DB and the job is failing due unique constraint voilation.

Thanks

Posted: Wed Feb 08, 2012 5:23 am
by v2kmadhav
if I were you - for something that contributes to the uniqueness, i would rather use something like a surrogate key generator or an equivalent logic instead.

Posted: Wed Feb 08, 2012 5:36 am
by mouthou
I am aware of that too. But it is a strange customer requirement to have some unpredictable dynamic values generated for that field. So, random was perfect pick for them!

Posted: Wed Feb 08, 2012 6:36 am
by ray.wurlod
There's no reason a random number would not generate duplicates. They're random, after all. You can even work out the probability of it occurring, if there is a finite domain of values from which the random number can be selected.

Posted: Wed Feb 08, 2012 7:36 am
by v2kmadhav
if you can be flexible with length/solution and its all about generating a randow number, try generating a unique value using a combination of something like inrownum||systime||random() or something that makes it unique ..

Posted: Wed Feb 08, 2012 8:10 am
by chulett
As noted, random <> unique.

Posted: Thu Feb 09, 2012 12:02 am
by mouthou
Thanks Craig/Ray/v2kmadhav for the inputs. Working out the recurrence of numbers as Ray says, it will consume time. Since this random function in Datastage was swirling my head , I had to go with alternative implementation of involving the database. I will close this topic.

In general terms, random<>unique. Nevertheless, in my case, random is being exploited for uniqueness of the values.

And also surprisingly it generates the same duplicate random numbers for the consequtive runs. I am not sure why and it is strange. There should be some place in Datastage where it needs be reset so that it doesnt come up with the same number. Can any one try this with more number of rows (probability of duplicate numbers is more)?

Posted: Thu Feb 09, 2012 1:17 am
by ray.wurlod
You can seed the random number generator in DataStage. I'd recommend something based on time (seconds since midnight, for example), possibly multiplied by numeric date components.

Posted: Thu Feb 09, 2012 8:13 am
by chulett
mouthou wrote:And also surprisingly it generates the same duplicate random numbers for the consequtive runs.
Not sure why, that's the way all "random" number generators work given the same seed value.