Handling Dataset

kumar3846 · Post by **kumar3846** » Thu May 24, 2007 1:10 pm

Hi,

I have a big dataset coming in with 15 million rows, i have to process only 5 million rows , can you guys please help me how can i handle this situation

Thanks In advance

Kumar

DSguru2B · Post by **DSguru2B** » Thu May 24, 2007 1:16 pm

Is it first 5M rows? Look into the Head stage. Depending upon the number of nodes, specify how many rows to pick from each partition.
eg: If you have a two node config file then you need to specify 2.5M per node.

kumar3846 · Post by **kumar3846** » Thu May 24, 2007 1:46 pm

Thanks,

Thanks for the help and ,if i want process 15 million rows in three steps(5 million in each step) then how can i do it , can you help out me on this

Thanks

Kumar

DSguru2B · Post by **DSguru2B** » Thu May 24, 2007 1:51 pm

Then use the sample stage. Create three datasets from your dataset. Process each dataset seperately.
The sample stage can split your data based on percentage.