Why Rnd() function generates same number twice?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
saadmirza
Participant
Posts: 76
Joined: Tue Mar 29, 2005 2:57 am

Why Rnd() function generates same number twice?

Post by saadmirza »

Hi,
Here is my code for generating Unique Random number keeping in mind that Rnd() function will not generate random number twice...I have distinctIds file which contain already used Numbers...if the number is present in the file then this routine doesnt generate that number again...
but if distincID file is empty the routine generates random non uniqe numbers...Please help me if there is some modification required in the code below.



$IFNDEF JOBCONTROL.H
$INCLUDE DSINCLUDE JOBCONTROL.H
$ENDIF
$DEFINE TESTING
* Declare "bitmap" strings to be in named COMMON area of memory, so that their
* values persist across multiple rows processed.
COMMON /FP1/UniqueIndex
If Len(UniqueIndex) = 1
Then
GoSub BuildBitmaps
End
GoSub GenerateRandom
GoSub CheckExistingRndNo

* Initialize return variable.
Ans = 0
* Generate Unique Number between 1 and 9
**********************************************************************************************
GenerateRandom:
**********************************************************************************************
BaseNumber = 1
UniqueIndexTmp = Rnd(4)
UniqueIndexNo=BaseNumber +UniqueIndexTmp
Ans = UniqueIndexNo
GoSub CheckExistingRndNo
Return(Ans)

**********************************************************************************************
CheckExistingRndNo:
**********************************************************************************************
UniqueIndexFound = UniqueIndex[UniqueIndexNo,1]
If UniqueIndexFound
Then
Message = "Results:"
Message<-1> = "UniqueIndexFound = " :UniqueIndexNo
Message<-1> = "Number Exists Generate New One"
Call DSLogInfo(Message, " ** TESTING ** ")
GoSub GenerateRandom
End
Else
Message = "Results:"
Message<-1> = "UniqueIndexFound = " :UniqueIndexNo
Message<-1> = "Number Does not Exists Pass the number to the Job"
End
Ans= UniqueIndexNo
Return(Ans)



**********************************************************************************************
BuildBitmaps:
**********************************************************************************************
*
* Here we construct the "bitmaps" in COMMON variables. This step is executed only if required
* (that is, usually, only on the first row processed).

UniqueIndex= Str("0", 8)
FileError = 1
OpenSeq "D:\files\DistinctIds.txt" To InputFvar
On Error
Message = 'Error (code ' : Status() : ') opening "input.txt" file.'
End
Locked
Message = 'File "DistinctIds.csv" locked by another process.'
End
Then
FileError = 0
Message = 'File "DistinctIds.csv" opened succcessfully.'
Loop
While ReadSeq Line From InputFvar
UniqueIndex[Line,1] = "1"
Repeat
Closeseq InputFvar
End
Else
Message = 'Unable to open "DistinctIds.csv" file for reading.'
End ; * end of OpenSeq statement
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Why Rnd() function generates same number twice?

Post by chulett »

saadmirza wrote:keeping in mind that Rnd() function will not generate random number twice...
What makes you think that statement is true? :? Kind of flies in the face of what the term 'random' means.
-craig

"You can never have too many knives" -- Logan Nine Fingers
PilotBaha
Premium Member
Premium Member
Posts: 202
Joined: Mon Jan 12, 2004 8:05 pm

Post by PilotBaha »

For that please use LightngTwce() function :) Sorry, couldn't help it :)
Earthbound misfit I..
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The base assumption - that Rnd() cannot generate the same number twice - is incorrect. That's like saying that tossing a coin can never generate two heads or two tails - what happens on the third toss? Edge? How about the fourth toss?

Your logic (somewhat familiar - are you at Reliance?) suggests that you don't really need Rnd() at all. You could use LOCATE to determine an unused place in the array. Can you document (in English, not in code) what you are trying to achieve?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
saadmirza
Participant
Posts: 76
Joined: Tue Mar 29, 2005 2:57 am

Post by saadmirza »

Hi,
Ok....Can anyone guide me as to how to generate Random non duplictae numbers using Rnd() or Randomize() functions available in DS...

Regards,
Saad
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The only way to guarantee that you don't repeat a number when you use a RND function is to keep track of those number already used. If you seed your pseudo-random number a the same point you will get a reproduceable series - but it will, by definition, have duplicates.

Another approach is to create a table with keys from 1 to {YourMax}. Then use the RND function to shuffle those values around the array/table. If you then read these sequentially you will get pseudo-random amounts and no duplicates.

Seriously, the concept of random numbers is one that doesn't belong in the business end (meaning data storage) of a Data Warehouse. I can imagine a number of statistical uses but very few others.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You aren't doing all of this to generate a surrogate key, are you? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Craig,

this looks like a continuation of a thread last week; and if that is the case then the answer to your question "yes"
saadmirza
Participant
Posts: 76
Joined: Tue Mar 29, 2005 2:57 am

Post by saadmirza »

Hi Ray,
As you said, I am trying to use a Bitmap array index to identify whether the random number generated is already present in the UniqueIndex .
If yes then I should generate another random number till I find that the number is not an existing number...but I think, i wrote this algo keeping int mind that Rnd() function will not generate the number twice within a single load run....As given in the help document...it says that Rnd() will generate non-repeatable seuqnece of numbers...but i dont know what exatly is happening...some says that I should use randomize in conjunction with Rnd() to generate non duplicate random numbers...but dont know how??Can you please guide me in this regards...the require of the client is that ...I need to generate new Employee IDs and that it should be randomly generated...Please suggest

Thanks,
Saad Mirza
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Saad,

another way to look at the RND() function not duplicating results is - what does RND(10) return when called for the 11th time? Or, to use a physical example, how can a dice be rolled 100 times without duplicating results.

There are two popular algorithms that generate pseudo-random numbers that are in common use; which one your operating system uses is not relevant. Both algorithms use a seed number to choose the starting location in a series of numbers; this way you can run a program using pseudo-random numbers with the same seed (randomize) number and get identical results each run. The Randomize/Seed does nothing for your particular problem.

If you store the previous maximum employee number in a table then you can generate a unique new employee number by adding 1 to the high value and incrementing it afterwards. If you have a requirement that the employee number shouldn't reflect the hiring date then you can either use a (secret to the user) algorithm or, if you really, really wish to stick with random numbers you could populate a file with all possible (and unused) employee numbers and then use RND() in a loop trying to find unused records until successful. I prefer not using RND().
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

"Nonrepeatable sequence" means exactly that; it is the sequence that is not repeatable; not any one number in the sequence. Indeed, there is a small but finite possibility that every number in the sequence will be the same, but this will not be true the next time a sequence is generated unless you deliberately seed the generator.

You can get guaranteed unique values by beginning with the current maximum and incrementing it. If you have SQL Server, you can also generate a thing called a GUID that is a "Guaranteed Unique ID".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply