Page 1 of 1

xml stage performance issues

Posted: Sun Jun 28, 2015 9:18 pm
by samyamkrishna
Hi,

I have job where it reads from a txt file. One of the column in the file is xml.
the file is 15gb and has 40 million records. The job runs for 3 hours.

Is there any way i can improve its performance?

Regards,
Samyam

Posted: Mon Jun 29, 2015 7:10 am
by chulett
Describe your job.

Posted: Tue Jun 30, 2015 2:38 am
by ray.wurlod
5 GB/hour isn't too bad on a small configuration. How many nodes are you using, and what kinds of (how powerful) processors?

Posted: Wed Nov 04, 2015 12:39 pm
by samyamkrishna
Hi All,

Sorry about the delayed response.
We tried a lot of options and the one of them gave us a good performance improvement.

We Split the input file into 4 files of the size 4GB each and triggered the same job 4 times in parallel reading the the 4 different files.

It came down to 1 hour processing time.

Regards,
Samyam

Posted: Wed Nov 04, 2015 3:42 pm
by ray.wurlod
You probably could have done that in one job with four partitions (or eight).