Issue in reading multiple files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Amit_111
Participant
Posts: 134
Joined: Sat Mar 24, 2007 11:37 am

Issue in reading multiple files

Post by Amit_111 »

Hello,

We have huge files (approx 48) each with a size of around 2GB. We are trying to read these files in a single job using 4 separate Sequential Stages and each stage reading around 12 files.
The job when executed takes too long to run but never finishes.

We tried multiple options of Config File and No. of readers per Node but somehow we do not see improvement in the overall job execution.
We even tried to split it in multiple jobs and it executes fine only when we have less than 5 files in one job. Anything above 5 files in a single job then the job does not complete and stays in Running mode itself.

Kindly let me know if any pointers. Thank You !!!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Have you considered building a looping Sequence job? Optimize the file load and then slam them through one at a time. Or make it multi-instance and have each instance take a portion of the file set to iterate through.
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

How long is too long?

Are you sure it never finishes? (Never hasn't arrived yet.) So, how long did you wait for the job to finish?

Delimited or fixed width files?

What else happens in your job design besides reading files?
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply