Help regarding RANDOM data selecting
Moderators: chulett, rschirm, roy
Help regarding RANDOM data selecting
Hi All,
I have a secenario where i have to select Random rows (i.e., equal amount of random rows from key column (STATE) randomly ) .I'm trying to use RANDOM function in transformer .Please help me how can i select random amount of rows ...
I have a secenario where i have to select Random rows (i.e., equal amount of random rows from key column (STATE) randomly ) .I'm trying to use RANDOM function in transformer .Please help me how can i select random amount of rows ...
krishna
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Use a Sample stage.
While you can use a Transformer stage, you need to understand how the Rnd() function works and, possibly, how to seed it. The Sample stage already includes random sampling.
While you can use a Transformer stage, you need to understand how the Rnd() function works and, possibly, how to seed it. The Sample stage already includes random sampling.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Actually my requirment needs equal amount of Random Data from each state ( Eg: 100 Records from each state (Key column) if there are 50 records in one state it shuld pick all 50 records ) .If i go with Sample stage if i give the percentage it will give the percentage of all the records ..means it might miss some records from the state with less number of records..
So i thought of using Transformer and RND() funtion and give a stage variable to limit 100 records from each state ..
So i thought of using Transformer and RND() funtion and give a stage variable to limit 100 records from each state ..
krishna
That's not really random, is it.100 Records from each state (Key column) If there are 50 records in one state it shuld pick all 50 records
![Wink :wink:](./images/smilies/icon_wink.gif)
How about this:
- Assign a random number column to each record with transformer or column generator
- Partition on State, Sort on State and RandomNumber
- In a transformer, keeping track of change in State (key), keep the first ### records from each key, drop the rest
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Have you read the documentation on the function in the Parallel Job Developer Guide?
The Rand and Random functions return a random unsigned integer when called. That's all there is to them.
Please refer to my earlier post for a suggested solution to your dilemma.
Regards,
The Rand and Random functions return a random unsigned integer when called. That's all there is to them.
Please refer to my earlier post for a suggested solution to your dilemma.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
The purpose of the logic I suggested is to randomly order the data (within a key value) by utilizing the random number functionality of transformer or column generator (or whatever source). Then you simply keep the first ### of the randomly ordered rows per key. While this requires you to sort the data, it greatly simplifies the logic required to select the rows to keep.
Regards,
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.