Unique random number generation

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
aspiresam
Premium Member
Premium Member
Posts: 7
Joined: Mon May 12, 2014 5:04 am

Unique random number generation

Post by aspiresam »

Dear all,

I would like to ask for a suggestion for making a "unique" random number. For the nature of "rand()" and "random()", they are not aimed to create a unique random number.

As the population of the customer list is million scale, it is not ideal to use row number (@OUTROWNUM) as unique factor. Also, the most important problem is the same results for same customers by random function.

Is it needed to do it by routine? Also, I am thinking about my own seed...

Any suggestion?!?

Thanks.

Regards,
Sam

[Note: Changed topic title from Lucky Draw to be more specific - helps with searches later - Andy]
Learning is a daily assignment.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Why random? Why not just just a serial surrogate?
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Lucky Draw

Post by chulett »

aspiresam wrote:Also, the most important problem is the same results for same customers by random function.
OK. Instead of starting with your 'random' discussion, why don't we open the bidding with what exactly this means and what exactly it is you are trying to accomplish, the more details the better. I suspect the word 'random' will not come into play in the solution. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
aspiresam
Premium Member
Premium Member
Posts: 7
Joined: Mon May 12, 2014 5:04 am

Post by aspiresam »

It is aimed to have a lucky draw against the customer list
Learning is a daily assignment.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I don't know about anyone else but that tells me absolutely nothing and I prefer not to guess. Sounds like some kind of contest and I have no idea why an ETL tool would be involved in something like that.

Still looking for some of those pesky 'details'...
Last edited by chulett on Wed Aug 13, 2014 8:11 am, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Does this customer list have preexisting customer keys from which you need to randomly choose one?

Or

Will you be generating customer keys for each customer in the list while simultaneously choosing one at random?

Mike
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

To generate a truly random numbers, especially from one job run to the next, you can combine the random() function result with an analog value such as the 6 digit microseconds from the CurrentTimestampMS() function. Microseconds are always changing.

To ensure that the randomly generated number is unique, which was your question, then you have to keep track of the previously generated numbers and compare the current value against the list. If already used then try, try again.

For your "most important problem of same results for same customers," like Craig suggested, the more details you provide, the better answers you will get. As of yet, the meaning is left up to imagination.
Choose a job you love, and you will never have to work a day in your life. - Confucius
aspiresam
Premium Member
Premium Member
Posts: 7
Joined: Mon May 12, 2014 5:04 am

Post by aspiresam »

Thanks all.

Actually, I have around 300,000 customers. If just taking random function, the customer A (say) has won the prize. Next time, it is likely to A again for the next draw. We are taking a monthly draw for VIP (frequent buyer).

So, that's why I am trying to have a unique random number. Sorry for my writing was not clear previously.

I would like to put my own seed like timestamp in microsecond into the random. However, I am not sure that I can doing something like:

random(<decimal of timestamp>)

Thanks in advance again.
Learning is a daily assignment.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You cannot, I'm afraid.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Create a key-only table called WINNERS and copy keys into there when they win. Apply your random selection to a DIFFERENCE set of the two tables (those in CUSTOMER but not in WINNERS).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
aspiresam
Premium Member
Premium Member
Posts: 7
Joined: Mon May 12, 2014 5:04 am

Post by aspiresam »

Thanks, Ray.

It's one of the possible work-around. However, it is not very ideal for getting a list of winners involving extra maintenance of table / file.

At this moment, I am thinking about routine way - but it still returns duplicated results even using a seed by time.

I have made a stored procedure in the DB2 to do similar stuff. However, I would like to fix it by DataStage.

-- My working routine function - still in trial & error stage --

Code: Select all

#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <math.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>

int main()
{
    struct timeval start,end;
    
    long mtime, seconds, useconds;

    gettimeofday(&start, NULL);
    usleep(12000);
    gettimeofday(&end, NULL);

    seconds = end.tv_sec - start.tv_sec;
    useconds= end.tv_usec - start.tv_usec;

    mtime =(1000000*seconds)+useconds;
    
    //seed by microseconds
    srand(mtime);

    long double Ans=(rand()%mtime);
 
    printf("Random by Time: %ld microseconds\n", mtime);
    printf("%g.\n",Ans);
return 0;
}
Learning is a daily assignment.
aspiresam
Premium Member
Premium Member
Posts: 7
Joined: Mon May 12, 2014 5:04 am

Post by aspiresam »

After a number of testing, the final version I would like to share with a quite fair random...

Previously, I am using a print function in C++ to observe the result
(NOTE: this will not work with the DataStage because it is calling object / library only)

Then, I don't need to maintain another customer list.

Code: Select all

#include <stdio.h> 
#include <time.h> 
#include <stdlib.h> 
#include <math.h> 
#include <unistd.h> 
#include <sys/time.h> 
#include <sys/resource.h> 

double myCustomRandom() 
{ 
    struct timeval start,end; 
    
    long mtime, seconds, useconds; 

    gettimeofday(&end, NULL); 

    mtime = end.tv_usec; 
    
    //seed by microseconds 
    srand(mtime); 

    int TmpAns=(rand()%mtime); 
    // over the integer range
    double Ans = TmpAns/mtime;  
    return (Ans); 
}
Learning is a daily assignment.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Thanks for sharing. If you run the routine at exactly the same time each day then it should produce the same result each time.

If it also takes the date as a number into the seed, then the result should vary each day, which I am guessing is what you really want.
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply