Want to copy only 700M out of 2.6Billion records

bskumar4u · Post by **bskumar4u** » Wed Apr 04, 2012 1:08 am

I have a .ds with 2.6billion records.
is there anyway that i can copy only 700million into another file without processing all the records as it is taking a long time to process all the records.
i have tried copying the file using a copy stage and it took 13hrs to process 2.6 billion(may be poor resources).
out of those 2.6 billion can i just copy only 700M...(without processing all 2.6b for such long time..)
the best way to reduce the process time..??
please do suggest...

kandyshandy · Post by **kandyshandy** » Wed Apr 04, 2012 3:39 am

what was your copy job design?

Dataset -> copy -> ?? (flat file or dataset)

bskumar4u · Post by **bskumar4u** » Wed Apr 04, 2012 4:36 am

it was a .ds..!!

can u provide any other design..??

kandyshandy · Post by **kandyshandy** » Wed Apr 04, 2012 5:11 am

Just now looked at your other post.. Oh... you made everyone busy

Other than the items discussed in the that post,
1. Dataset -> transformer -> Dataset. In this approach, you can set up a counter and use that counter as a constraint in transformer.

2. If you don't want the target to be in dataset, then try orchadmin command to dump that dataset into a flat file (no one knows how much time it would take or will it ever end). If successful, use an UNIX command to take 700 MN records from that file & dump to another file.

3. Ignore this task and ask for a file with 700 MN records only (from source system). If this is from a table, put the filter in ther SQL itself.

bskumar4u · Post by **bskumar4u** » Wed Apr 04, 2012 5:54 am

LOL..!!!

3. Ignore this task and ask for a file with 700 MN records only (from source system). If this is from a table, put the filter in ther SQL itself.

Actually the file is an output of predeccsor job...
so must perform many operations to control this...
Anyways was trying many things since 2 days...!!
wonder what would HEAD stage perform..!!
still trying out something...
anymore suggestions...plz reply..!!

chulett · Post by **chulett** » Wed Apr 04, 2012 8:12 am

There was absolutely no reason to start a new post for this.

bskumar4u · Post by **bskumar4u** » Wed Apr 04, 2012 8:25 am

There was absolutely no reason to start a new post for this.

Sorry if u cudnt find a reason for this new post..!!
but need to mention my reason was...
earlier post was just to know the diff between copy n transformer and easiest way to copy a .ds...!!
i cudnt get wat i wanted...i mean i cudnt do it in lesser time...
so changed my requirement so that i load atleast 700M so wanted to know if it is answered..??
(As it was no more copy vs transformer but a diff req...)
Sorry..!! again...!!
I'm still waiting if it can be overcome...!!

chulett · Post by **chulett** » Wed Apr 04, 2012 8:33 am

I "couldn't find a reason" because there wasn't one... you'd already asked this question in your other post. Not to worry, I'll go remove it.

bskumar4u · Post by **bskumar4u** » Wed Apr 04, 2012 8:41 am

oh..!! come on...!!!
I need the solution not the reason

..LOL..!!

qt_ky · Post by **qt_ky** » Wed Apr 04, 2012 1:49 pm

Try using the Head stage or the Sample stage. They operate in parallel, so any record count you give is per node.