Random number generation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
RPhani
Participant
Posts: 32
Joined: Sun Aug 26, 2012 7:03 am
Location: Hyd

Random number generation

Post by RPhani »

Hi,

How to generate unique random numbers for each run of a sequence?

Thanks,
Phani
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Details please. A single number each run or several over the course of the run? And does it really need to be 'random', meaning would a surrogate (unique but sequential) not work?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Pseudo-random numbers can be generated in both PX and Server versions. But as Craig has already pointed out, these are probably not the best solution to your problem, particularly as they can (and do) repeat values.
RPhani
Participant
Posts: 32
Joined: Sun Aug 26, 2012 7:03 am
Location: Hyd

Post by RPhani »

Requirement is : we need to generate unique random numbers for two customers.
(ABC,XYZ).

Input
Cust_Name UpperLimit
ABC 1000000
XYZ 300000


Target1
Cust_Name RandomNum
ABC 100
ABC 463487
ABC 6579
ABC 87456
6709

Target2
Cust_Name RandomNum
XYZ 3480
XYZ 23090
XYZ 54
XYZ 90045
XYZ 546
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

By their very nature, generated random numbers are not unique.

Your approach will be different between having unique numbers per run or across several runs.

One approach is to create a table with just a numeric key and fill it with 1,000,000 records. Then use the random number generator to deliver a number between 1 and 1,000,000 and read that record. If the read is successful, then you have a unique random number to use and you delete the record from the table. If the read is not successful, then you have to repeat the random number generation and read process until you have a successful read. Note that this gets less and less efficient as you remove records from the pool.

In server, where long strings are efficiently processed, you can make a long delimited list and just remove the element each time, making the process much quicker.


Do you know how many elements you need?
RPhani
Participant
Posts: 32
Joined: Sun Aug 26, 2012 7:03 am
Location: Hyd

Post by RPhani »

Every run will generate unique random numbers.

For every run we are genarating 1000 random numbers for ABC and 500 random numbers for XYZ

Job design :

RowgenStage(ABC):

Feilds:
Cust_Name(varchar)
RanNum(Integer) --> Type=random
Limit=1000000
Seedval=#seed_Val_ABC#

RowgenStage(XYZ):

Cust_Name(varchar)
RanNum(Integer) --> Type=random
Limit=300000
Seedval=#seed_Val_XYZ#


seed_Val_ABC,seed_Val_XYZ are parameters ..I am assinging a value from Sequence.

seed_Val_ABC-->KeyMgtGetNextValue(1)
seed_Val_ABC-->KeyMgtGetNextValue(1)


KeyMgtGetNextValue function will work for this scenario or not?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Your randomly generated numbers in the Row Generator stage will not be unique and the KeyMgtGetNextValue won't work either.
RPhani
Participant
Posts: 32
Joined: Sun Aug 26, 2012 7:03 am
Location: Hyd

Post by RPhani »

Row genarate stage, which is producing unique random numbers based Random algorthm (Type=random)
we are passing Limit and Seed values as parameters...

If I provide the seed value is unique number for every run and getting unique random numbers..


seed_Val_ABC-->KeyMgtGetNextValue(1)
seed_Val_ABC-->KeyMgtGetNextValue(1)

here we want to use constraint for seed value <9999.

how can I achieve this by using KeyMgtGetNextValue function ?
RPhani
Participant
Posts: 32
Joined: Sun Aug 26, 2012 7:03 am
Location: Hyd

Post by RPhani »

ArndW wrote:Your randomly generated numbers in the Row Generator stage will not be unique and the KeyMgtGetNextValue won't work either. ...


Row genarate stage, which is producing unique random numbers based Random algorthm (Type=random)
we are passing Limit and Seed values as parameters...

If I provide the seed value is unique number for every run and getting unique random numbers..


seed_Val_ABC-->KeyMgtGetNextValue(1)
seed_Val_ABC-->KeyMgtGetNextValue(1)

here we want to use constraint for seed value <9999.

how can I achieve this by using KeyMgtGetNextValue function ?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

RPhani wrote:Row genarate stage, which is producing unique random numbers based Random algorthm (Type=random)...
No, your generated numbers are not going to be unique. The "seed" just give a starting point and the pseudo-random series will always be the same when using the same seed.

I just did a quick test, random integer with 1,000,000 rows generator only 999,759 unique ids (I used a seed of "1", so you can reproduce this as well).
RPhani
Participant
Posts: 32
Joined: Sun Aug 26, 2012 7:03 am
Location: Hyd

Post by RPhani »

Yes ..I am agree with your answer ..

If the "seed" value is same for mutiple runs..it will produce same random numbers.

That's why in our job, we are passing "seed" from job Sequence by using KeyMgtGetNextValue() function.
For every run it will generate new sequence number.

seed_Val_ABC-->KeyMgtGetNextValue(1)
seed_Val_XYZ-->KeyMgtGetNextValue(1)


for example:
1 st run:

seed_Val_ABC-->KeyMgtGetNextValue(1)----OutputVal '1'
seed_Val_XYZ-->KeyMgtGetNextValue(1)----OutputVal '2'

2nd run:

seed_Val_ABC-->KeyMgtGetNextValue(1)----OutputVal '3'
seed_Val_XYZ-->KeyMgtGetNextValue(1)----OutputVal '4'

3rd run:

seed_Val_ABC-->KeyMgtGetNextValue(1)----OutputVal '5'
seed_Val_XYZ-->KeyMgtGetNextValue(1)----OutputVal '6'


here we want to use constraint for seed value(i.e., KeyMgtGetNextValue(1)) <9999.

how can I achieve this by using KeyMgtGetNextValue function at job Sequence ?
Total random numbers for ABC is 1000
Total random numbers for XYZ is 500

here we are not producing 10 Lacks / 3lacks unique random numbers for each run.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You are missing the fundamental issue: random does not equal unique.

Yes, using a unique seed each run will ensure that the sequence of random numbers generated is always different but there's no guarantee that they will be unique within a run and certainly not across runs. It seems that the latter is not an issue for you but the former will be... it may look like it is working but in reality it won't be. If that is truly important / critical to your processing, you'll need to find another mechanism.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply