Help regarding RANDOM data selecting

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

krishna14
Participant
Posts: 24
Joined: Mon Jan 31, 2011 6:43 pm

Help regarding RANDOM data selecting

Post by krishna14 »

Hi All,

I have a secenario where i have to select Random rows (i.e., equal amount of random rows from key column (STATE) randomly ) .I'm trying to use RANDOM function in transformer .Please help me how can i select random amount of rows ...
krishna
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

What is your source?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
krishna14
Participant
Posts: 24
Joined: Mon Jan 31, 2011 6:43 pm

Post by krishna14 »

Our Source is DB2 ..
krishna
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use a Sample stage.

While you can use a Transformer stage, you need to understand how the Rnd() function works and, possibly, how to seed it. The Sample stage already includes random sampling.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
krishna14
Participant
Posts: 24
Joined: Mon Jan 31, 2011 6:43 pm

Post by krishna14 »

Thanks Ray ,

I appreciate your help ,Can u please elaborate how can i seed in the Rnd() function and how can i use it .Please ..
krishna
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why not use the Sample stage?

Right tool for the job, and all that.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
krishna14
Participant
Posts: 24
Joined: Mon Jan 31, 2011 6:43 pm

Post by krishna14 »

Actually my requirment needs equal amount of Random Data from each state ( Eg: 100 Records from each state (Key column) if there are 50 records in one state it shuld pick all 50 records ) .If i go with Sample stage if i give the percentage it will give the percentage of all the records ..means it might miss some records from the state with less number of records..

So i thought of using Transformer and RND() funtion and give a stage variable to limit 100 records from each state ..
krishna
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

100 Records from each state (Key column) If there are 50 records in one state it shuld pick all 50 records
That's not really random, is it. :wink:

How about this:

- Assign a random number column to each record with transformer or column generator
- Partition on State, Sort on State and RandomNumber
- In a transformer, keeping track of change in State (key), keep the first ### records from each key, drop the rest

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Maybe a Head stage, then, with the Period and Number of Rows properties used.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Check out the select statement for random rows from db2 here. You can enhance it to fit your needs.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
krishna14
Participant
Posts: 24
Joined: Mon Jan 31, 2011 6:43 pm

Post by krishna14 »

Ray,

Thanks Ray ,But If we use HEAD stage it will give the every Nth records ... if we give period..,but as per my requirment i shuld get randomly not every Nth record.
krishna
krishna14
Participant
Posts: 24
Joined: Mon Jan 31, 2011 6:43 pm

Post by krishna14 »

any one can help me out with RND() funtion in transformer ? Please to get the Random data..
krishna
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Have you read the documentation on the function in the Parallel Job Developer Guide?

The Rand and Random functions return a random unsigned integer when called. That's all there is to them.

Please refer to my earlier post for a suggested solution to your dilemma.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
krishna14
Participant
Posts: 24
Joined: Mon Jan 31, 2011 6:43 pm

Post by krishna14 »

S i did read the Parallel Job Developer Guide ..

Thanks for your suggestion jwiles.
krishna
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

The purpose of the logic I suggested is to randomly order the data (within a key value) by utilizing the random number functionality of transformer or column generator (or whatever source). Then you simply keep the first ### of the randomly ordered rows per key. While this requires you to sort the data, it greatly simplifies the logic required to select the rows to keep.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply