Page 1 of 1

Unique random number generation

Posted: Tue Aug 12, 2014 8:06 am
by aspiresam
Dear all,

I would like to ask for a suggestion for making a "unique" random number. For the nature of "rand()" and "random()", they are not aimed to create a unique random number.

As the population of the customer list is million scale, it is not ideal to use row number (@OUTROWNUM) as unique factor. Also, the most important problem is the same results for same customers by random function.

Is it needed to do it by routine? Also, I am thinking about my own seed...

Any suggestion?!?

Thanks.

Regards,
Sam

[Note: Changed topic title from Lucky Draw to be more specific - helps with searches later - Andy]

Posted: Tue Aug 12, 2014 9:10 am
by chulett
Why random? Why not just just a serial surrogate?

Re: Lucky Draw

Posted: Tue Aug 12, 2014 9:34 am
by chulett
aspiresam wrote:Also, the most important problem is the same results for same customers by random function.
OK. Instead of starting with your 'random' discussion, why don't we open the bidding with what exactly this means and what exactly it is you are trying to accomplish, the more details the better. I suspect the word 'random' will not come into play in the solution. :wink:

Posted: Wed Aug 13, 2014 6:05 am
by aspiresam
It is aimed to have a lucky draw against the customer list

Posted: Wed Aug 13, 2014 7:00 am
by chulett
I don't know about anyone else but that tells me absolutely nothing and I prefer not to guess. Sounds like some kind of contest and I have no idea why an ETL tool would be involved in something like that.

Still looking for some of those pesky 'details'...

Posted: Wed Aug 13, 2014 7:04 am
by Mike
Does this customer list have preexisting customer keys from which you need to randomly choose one?

Or

Will you be generating customer keys for each customer in the list while simultaneously choosing one at random?

Mike

Posted: Wed Aug 13, 2014 8:01 am
by qt_ky
To generate a truly random numbers, especially from one job run to the next, you can combine the random() function result with an analog value such as the 6 digit microseconds from the CurrentTimestampMS() function. Microseconds are always changing.

To ensure that the randomly generated number is unique, which was your question, then you have to keep track of the previously generated numbers and compare the current value against the list. If already used then try, try again.

For your "most important problem of same results for same customers," like Craig suggested, the more details you provide, the better answers you will get. As of yet, the meaning is left up to imagination.

Posted: Wed Aug 13, 2014 10:19 pm
by aspiresam
Thanks all.

Actually, I have around 300,000 customers. If just taking random function, the customer A (say) has won the prize. Next time, it is likely to A again for the next draw. We are taking a monthly draw for VIP (frequent buyer).

So, that's why I am trying to have a unique random number. Sorry for my writing was not clear previously.

I would like to put my own seed like timestamp in microsecond into the random. However, I am not sure that I can doing something like:

random(<decimal of timestamp>)

Thanks in advance again.

Posted: Wed Aug 13, 2014 11:12 pm
by chulett
You cannot, I'm afraid.

Posted: Thu Aug 14, 2014 4:12 pm
by ray.wurlod
Create a key-only table called WINNERS and copy keys into there when they win. Apply your random selection to a DIFFERENCE set of the two tables (those in CUSTOMER but not in WINNERS).

Posted: Fri Aug 15, 2014 7:08 am
by aspiresam
Thanks, Ray.

It's one of the possible work-around. However, it is not very ideal for getting a list of winners involving extra maintenance of table / file.

At this moment, I am thinking about routine way - but it still returns duplicated results even using a seed by time.

I have made a stored procedure in the DB2 to do similar stuff. However, I would like to fix it by DataStage.

-- My working routine function - still in trial & error stage --

Code: Select all

#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <math.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>

int main()
{
    struct timeval start,end;
    
    long mtime, seconds, useconds;

    gettimeofday(&start, NULL);
    usleep(12000);
    gettimeofday(&end, NULL);

    seconds = end.tv_sec - start.tv_sec;
    useconds= end.tv_usec - start.tv_usec;

    mtime =(1000000*seconds)+useconds;
    
    //seed by microseconds
    srand(mtime);

    long double Ans=(rand()%mtime);
 
    printf("Random by Time: %ld microseconds\n", mtime);
    printf("%g.\n",Ans);
return 0;
}

Posted: Mon Aug 18, 2014 2:05 am
by aspiresam
After a number of testing, the final version I would like to share with a quite fair random...

Previously, I am using a print function in C++ to observe the result
(NOTE: this will not work with the DataStage because it is calling object / library only)

Then, I don't need to maintain another customer list.

Code: Select all

#include <stdio.h> 
#include <time.h> 
#include <stdlib.h> 
#include <math.h> 
#include <unistd.h> 
#include <sys/time.h> 
#include <sys/resource.h> 

double myCustomRandom() 
{ 
    struct timeval start,end; 
    
    long mtime, seconds, useconds; 

    gettimeofday(&end, NULL); 

    mtime = end.tv_usec; 
    
    //seed by microseconds 
    srand(mtime); 

    int TmpAns=(rand()%mtime); 
    // over the integer range
    double Ans = TmpAns/mtime;  
    return (Ans); 
}

Posted: Mon Aug 18, 2014 11:03 am
by qt_ky
Thanks for sharing. If you run the routine at exactly the same time each day then it should produce the same result each time.

If it also takes the date as a number into the seed, then the result should vary each day, which I am guessing is what you really want.