Page 1 of 1

Random number generation

Posted: Wed Jul 17, 2013 3:24 am
by RPhani
Hi,

How to generate unique random numbers for each run of a sequence?

Thanks,
Phani

Posted: Wed Jul 17, 2013 6:38 am
by chulett
Details please. A single number each run or several over the course of the run? And does it really need to be 'random', meaning would a surrogate (unique but sequential) not work?

Posted: Wed Jul 17, 2013 6:59 am
by ArndW
Pseudo-random numbers can be generated in both PX and Server versions. But as Craig has already pointed out, these are probably not the best solution to your problem, particularly as they can (and do) repeat values.

Posted: Wed Jul 17, 2013 11:12 pm
by RPhani
Requirement is : we need to generate unique random numbers for two customers.
(ABC,XYZ).

Input
Cust_Name UpperLimit
ABC 1000000
XYZ 300000


Target1
Cust_Name RandomNum
ABC 100
ABC 463487
ABC 6579
ABC 87456
6709

Target2
Cust_Name RandomNum
XYZ 3480
XYZ 23090
XYZ 54
XYZ 90045
XYZ 546

Posted: Thu Jul 18, 2013 12:36 am
by ArndW
By their very nature, generated random numbers are not unique.

Your approach will be different between having unique numbers per run or across several runs.

One approach is to create a table with just a numeric key and fill it with 1,000,000 records. Then use the random number generator to deliver a number between 1 and 1,000,000 and read that record. If the read is successful, then you have a unique random number to use and you delete the record from the table. If the read is not successful, then you have to repeat the random number generation and read process until you have a successful read. Note that this gets less and less efficient as you remove records from the pool.

In server, where long strings are efficiently processed, you can make a long delimited list and just remove the element each time, making the process much quicker.


Do you know how many elements you need?

Posted: Thu Jul 18, 2013 1:28 am
by RPhani
Every run will generate unique random numbers.

For every run we are genarating 1000 random numbers for ABC and 500 random numbers for XYZ

Job design :

RowgenStage(ABC):

Feilds:
Cust_Name(varchar)
RanNum(Integer) --> Type=random
Limit=1000000
Seedval=#seed_Val_ABC#

RowgenStage(XYZ):

Cust_Name(varchar)
RanNum(Integer) --> Type=random
Limit=300000
Seedval=#seed_Val_XYZ#


seed_Val_ABC,seed_Val_XYZ are parameters ..I am assinging a value from Sequence.

seed_Val_ABC-->KeyMgtGetNextValue(1)
seed_Val_ABC-->KeyMgtGetNextValue(1)


KeyMgtGetNextValue function will work for this scenario or not?

Posted: Thu Jul 18, 2013 2:50 am
by ArndW
Your randomly generated numbers in the Row Generator stage will not be unique and the KeyMgtGetNextValue won't work either.

Posted: Thu Jul 18, 2013 3:15 am
by RPhani
Row genarate stage, which is producing unique random numbers based Random algorthm (Type=random)
we are passing Limit and Seed values as parameters...

If I provide the seed value is unique number for every run and getting unique random numbers..


seed_Val_ABC-->KeyMgtGetNextValue(1)
seed_Val_ABC-->KeyMgtGetNextValue(1)

here we want to use constraint for seed value <9999.

how can I achieve this by using KeyMgtGetNextValue function ?

Posted: Thu Jul 18, 2013 3:17 am
by RPhani
ArndW wrote:Your randomly generated numbers in the Row Generator stage will not be unique and the KeyMgtGetNextValue won't work either. ...


Row genarate stage, which is producing unique random numbers based Random algorthm (Type=random)
we are passing Limit and Seed values as parameters...

If I provide the seed value is unique number for every run and getting unique random numbers..


seed_Val_ABC-->KeyMgtGetNextValue(1)
seed_Val_ABC-->KeyMgtGetNextValue(1)

here we want to use constraint for seed value <9999.

how can I achieve this by using KeyMgtGetNextValue function ?

Posted: Thu Jul 18, 2013 4:06 am
by ArndW
RPhani wrote:Row genarate stage, which is producing unique random numbers based Random algorthm (Type=random)...
No, your generated numbers are not going to be unique. The "seed" just give a starting point and the pseudo-random series will always be the same when using the same seed.

I just did a quick test, random integer with 1,000,000 rows generator only 999,759 unique ids (I used a seed of "1", so you can reproduce this as well).

Posted: Thu Jul 18, 2013 5:05 am
by RPhani
Yes ..I am agree with your answer ..

If the "seed" value is same for mutiple runs..it will produce same random numbers.

That's why in our job, we are passing "seed" from job Sequence by using KeyMgtGetNextValue() function.
For every run it will generate new sequence number.

seed_Val_ABC-->KeyMgtGetNextValue(1)
seed_Val_XYZ-->KeyMgtGetNextValue(1)


for example:
1 st run:

seed_Val_ABC-->KeyMgtGetNextValue(1)----OutputVal '1'
seed_Val_XYZ-->KeyMgtGetNextValue(1)----OutputVal '2'

2nd run:

seed_Val_ABC-->KeyMgtGetNextValue(1)----OutputVal '3'
seed_Val_XYZ-->KeyMgtGetNextValue(1)----OutputVal '4'

3rd run:

seed_Val_ABC-->KeyMgtGetNextValue(1)----OutputVal '5'
seed_Val_XYZ-->KeyMgtGetNextValue(1)----OutputVal '6'


here we want to use constraint for seed value(i.e., KeyMgtGetNextValue(1)) <9999.

how can I achieve this by using KeyMgtGetNextValue function at job Sequence ?
Total random numbers for ABC is 1000
Total random numbers for XYZ is 500

here we are not producing 10 Lacks / 3lacks unique random numbers for each run.

Posted: Thu Jul 18, 2013 7:24 am
by chulett
You are missing the fundamental issue: random does not equal unique.

Yes, using a unique seed each run will ensure that the sequence of random numbers generated is always different but there's no guarantee that they will be unique within a run and certainly not across runs. It seems that the latter is not an issue for you but the former will be... it may look like it is working but in reality it won't be. If that is truly important / critical to your processing, you'll need to find another mechanism.