Handling Dataset

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kumar3846
Participant
Posts: 36
Joined: Mon Jan 09, 2006 2:58 pm

Handling Dataset

Post by kumar3846 »

Hi,

I have a big dataset coming in with 15 million rows, i have to process only 5 million rows , can you guys please help me how can i handle this situation

Thanks In advance


Kumar
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Is it first 5M rows? Look into the Head stage. Depending upon the number of nodes, specify how many rows to pick from each partition.
eg: If you have a two node config file then you need to specify 2.5M per node.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kumar3846
Participant
Posts: 36
Joined: Mon Jan 09, 2006 2:58 pm

Post by kumar3846 »

Thanks,

Thanks for the help and ,if i want process 15 million rows in three steps(5 million in each step) then how can i do it , can you help out me on this



Thanks

Kumar
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Then use the sample stage. Create three datasets from your dataset. Process each dataset seperately.
The sample stage can split your data based on percentage.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply