Page 1 of 2
Help regarding RANDOM data selecting
Posted: Mon Apr 11, 2011 11:03 am
by krishna14
Hi All,
I have a secenario where i have to select Random rows (i.e., equal amount of random rows from key column (STATE) randomly ) .I'm trying to use RANDOM function in transformer .Please help me how can i select random amount of rows ...
Posted: Mon Apr 11, 2011 2:02 pm
by DSguru2B
What is your source?
Posted: Mon Apr 11, 2011 3:37 pm
by krishna14
Our Source is DB2 ..
Posted: Mon Apr 11, 2011 4:46 pm
by ray.wurlod
Use a Sample stage.
While you can use a Transformer stage, you need to understand how the Rnd() function works and, possibly, how to seed it. The Sample stage already includes random sampling.
Posted: Mon Apr 11, 2011 7:35 pm
by krishna14
Thanks Ray ,
I appreciate your help ,Can u please elaborate how can i seed in the Rnd() function and how can i use it .Please ..
Posted: Mon Apr 11, 2011 7:54 pm
by ray.wurlod
Why not use the Sample stage?
Right tool for the job, and all that.
Posted: Mon Apr 11, 2011 8:32 pm
by krishna14
Actually my requirment needs equal amount of Random Data from each state ( Eg: 100 Records from each state (Key column) if there are 50 records in one state it shuld pick all 50 records ) .If i go with Sample stage if i give the percentage it will give the percentage of all the records ..means it might miss some records from the state with less number of records..
So i thought of using Transformer and RND() funtion and give a stage variable to limit 100 records from each state ..
Posted: Mon Apr 11, 2011 9:16 pm
by jwiles
100 Records from each state (Key column) If there are 50 records in one state it shuld pick all 50 records
That's not really random, is it.
How about this:
- Assign a random number column to each record with transformer or column generator
- Partition on State, Sort on State and RandomNumber
- In a transformer, keeping track of change in State (key), keep the first ### records from each key, drop the rest
Regards,
Posted: Mon Apr 11, 2011 10:51 pm
by ray.wurlod
Maybe a Head stage, then, with the Period and Number of Rows properties used.
Posted: Tue Apr 12, 2011 7:17 am
by DSguru2B
Check out the select statement for random rows from db2
here. You can enhance it to fit your needs.
Posted: Tue Apr 12, 2011 8:34 am
by krishna14
Ray,
Thanks Ray ,But If we use HEAD stage it will give the every Nth records ... if we give period..,but as per my requirment i shuld get randomly not every Nth record.
Posted: Tue Apr 12, 2011 10:36 am
by krishna14
any one can help me out with RND() funtion in transformer ? Please to get the Random data..
Posted: Tue Apr 12, 2011 11:37 am
by jwiles
Have you read the documentation on the function in the Parallel Job Developer Guide?
The Rand and Random functions return a random unsigned integer when called. That's all there is to them.
Please refer to my earlier post for a suggested solution to your dilemma.
Regards,
Posted: Tue Apr 12, 2011 1:48 pm
by krishna14
S i did read the Parallel Job Developer Guide ..
Thanks for your suggestion jwiles.
Posted: Tue Apr 12, 2011 2:40 pm
by jwiles
The purpose of the logic I suggested is to randomly order the data (within a key value) by utilizing the random number functionality of transformer or column generator (or whatever source). Then you simply keep the first ### of the randomly ordered rows per key. While this requires you to sort the data, it greatly simplifies the logic required to select the rows to keep.
Regards,