Want to copy only 700M out of 2.6Billion records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bskumar4u
Participant
Posts: 13
Joined: Mon Feb 21, 2011 4:47 am
Location: Hyderabad

Want to copy only 700M out of 2.6Billion records

Post by bskumar4u »

I have a .ds with 2.6billion records.
is there anyway that i can copy only 700million into another file without processing all the records as it is taking a long time to process all the records.
i have tried copying the file using a copy stage and it took 13hrs to process 2.6 billion(may be poor resources).
out of those 2.6 billion can i just copy only 700M...(without processing all 2.6b for such long time..)
the best way to reduce the process time..??
please do suggest...
....................Shanthi
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

what was your copy job design?

Dataset -> copy -> ?? (flat file or dataset)
Kandy
_________________
Try and Try again…You will succeed atlast!!
bskumar4u
Participant
Posts: 13
Joined: Mon Feb 21, 2011 4:47 am
Location: Hyderabad

Post by bskumar4u »

it was a .ds..!!

can u provide any other design..??
....................Shanthi
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

Just now looked at your other post.. Oh... you made everyone busy ;)

Other than the items discussed in the that post,
1. Dataset -> transformer -> Dataset. In this approach, you can set up a counter and use that counter as a constraint in transformer.

2. If you don't want the target to be in dataset, then try orchadmin command to dump that dataset into a flat file (no one knows how much time it would take or will it ever end). If successful, use an UNIX command to take 700 MN records from that file & dump to another file.

3. Ignore this task and ask for a file with 700 MN records only (from source system). If this is from a table, put the filter in ther SQL itself.
Kandy
_________________
Try and Try again…You will succeed atlast!!
bskumar4u
Participant
Posts: 13
Joined: Mon Feb 21, 2011 4:47 am
Location: Hyderabad

Post by bskumar4u »

LOL..!!!
3. Ignore this task and ask for a file with 700 MN records only (from source system). If this is from a table, put the filter in ther SQL itself.
Actually the file is an output of predeccsor job...
so must perform many operations to control this...
Anyways was trying many things since 2 days...!!
wonder what would HEAD stage perform..!!
still trying out something...
anymore suggestions...plz reply..!!
....................Shanthi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:!: There was absolutely no reason to start a new post for this.
-craig

"You can never have too many knives" -- Logan Nine Fingers
bskumar4u
Participant
Posts: 13
Joined: Mon Feb 21, 2011 4:47 am
Location: Hyderabad

Post by bskumar4u »

There was absolutely no reason to start a new post for this.

Sorry if u cudnt find a reason for this new post..!!
but need to mention my reason was...
earlier post was just to know the diff between copy n transformer and easiest way to copy a .ds...!!
i cudnt get wat i wanted...i mean i cudnt do it in lesser time...
so changed my requirement so that i load atleast 700M so wanted to know if it is answered..??
(As it was no more copy vs transformer but a diff req...)
Sorry..!! again...!!
I'm still waiting if it can be overcome...!!
....................Shanthi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I "couldn't find a reason" because there wasn't one... you'd already asked this question in your other post. Not to worry, I'll go remove it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
bskumar4u
Participant
Posts: 13
Joined: Mon Feb 21, 2011 4:47 am
Location: Hyderabad

Post by bskumar4u »

oh..!! come on...!!!
I need the solution not the reason :lol: ..LOL..!! :)
....................Shanthi
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Try using the Head stage or the Sample stage. They operate in parallel, so any record count you give is per node.
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply