Extract specific rows from Sequential File

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ds2000
Premium Member
Premium Member
Posts: 109
Joined: Sun Apr 22, 2007 7:25 pm
Location: ny

Extract specific rows from Sequential File

Post by ds2000 »

I used DB2 Load utility to load from a seq. file that is around 10gb file. Utility gave some reject log and showing record number in it which are rejected or partially loaded. I want to see those rows from sequential file. I used a px job with transformer (running sequentially) and filter those record number into another seq file. I used a stage variable with an incremental of 1 to filter the records.

I have started running it but target file is growing around 2gb although i have only 10 records for filteration. Because source file is 9gb so job would take time to finish.

Is this right solution or what other options i do have to pickup those record numebr from sequential file ?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sequential files mean just that - they are accessed sequentially from beginning to end. Something must read it to find the 10 records you want, be it the DataStage job or something beforehand like awk or perl.

Another thought would be to bulk load it into a work table and then use sql to pull out just the records you want.

From what I recall, there's also the possibility of parallel reads in DataStage, but I think it needs to be a fixed-width file to allow that to happen.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply