Page 1 of 2

Help regarding RANDOM data selecting

Posted: Mon Apr 11, 2011 11:03 am
by krishna14
Hi All,

I have a secenario where i have to select Random rows (i.e., equal amount of random rows from key column (STATE) randomly ) .I'm trying to use RANDOM function in transformer .Please help me how can i select random amount of rows ...

Posted: Mon Apr 11, 2011 2:02 pm
by DSguru2B
What is your source?

Posted: Mon Apr 11, 2011 3:37 pm
by krishna14
Our Source is DB2 ..

Posted: Mon Apr 11, 2011 4:46 pm
by ray.wurlod
Use a Sample stage.

While you can use a Transformer stage, you need to understand how the Rnd() function works and, possibly, how to seed it. The Sample stage already includes random sampling.

Posted: Mon Apr 11, 2011 7:35 pm
by krishna14
Thanks Ray ,

I appreciate your help ,Can u please elaborate how can i seed in the Rnd() function and how can i use it .Please ..

Posted: Mon Apr 11, 2011 7:54 pm
by ray.wurlod
Why not use the Sample stage?

Right tool for the job, and all that.

Posted: Mon Apr 11, 2011 8:32 pm
by krishna14
Actually my requirment needs equal amount of Random Data from each state ( Eg: 100 Records from each state (Key column) if there are 50 records in one state it shuld pick all 50 records ) .If i go with Sample stage if i give the percentage it will give the percentage of all the records ..means it might miss some records from the state with less number of records..

So i thought of using Transformer and RND() funtion and give a stage variable to limit 100 records from each state ..

Posted: Mon Apr 11, 2011 9:16 pm
by jwiles
100 Records from each state (Key column) If there are 50 records in one state it shuld pick all 50 records
That's not really random, is it. :wink:

How about this:

- Assign a random number column to each record with transformer or column generator
- Partition on State, Sort on State and RandomNumber
- In a transformer, keeping track of change in State (key), keep the first ### records from each key, drop the rest

Regards,

Posted: Mon Apr 11, 2011 10:51 pm
by ray.wurlod
Maybe a Head stage, then, with the Period and Number of Rows properties used.

Posted: Tue Apr 12, 2011 7:17 am
by DSguru2B
Check out the select statement for random rows from db2 here. You can enhance it to fit your needs.

Posted: Tue Apr 12, 2011 8:34 am
by krishna14
Ray,

Thanks Ray ,But If we use HEAD stage it will give the every Nth records ... if we give period..,but as per my requirment i shuld get randomly not every Nth record.

Posted: Tue Apr 12, 2011 10:36 am
by krishna14
any one can help me out with RND() funtion in transformer ? Please to get the Random data..

Posted: Tue Apr 12, 2011 11:37 am
by jwiles
Have you read the documentation on the function in the Parallel Job Developer Guide?

The Rand and Random functions return a random unsigned integer when called. That's all there is to them.

Please refer to my earlier post for a suggested solution to your dilemma.

Regards,

Posted: Tue Apr 12, 2011 1:48 pm
by krishna14
S i did read the Parallel Job Developer Guide ..

Thanks for your suggestion jwiles.

Posted: Tue Apr 12, 2011 2:40 pm
by jwiles
The purpose of the logic I suggested is to randomly order the data (within a key value) by utilizing the random number functionality of transformer or column generator (or whatever source). Then you simply keep the first ### of the randomly ordered rows per key. While this requires you to sort the data, it greatly simplifies the logic required to select the rows to keep.

Regards,