Page 1 of 1
xml stage performance issues
Posted: Sun Jun 28, 2015 9:18 pm
by samyamkrishna
Hi,
I have job where it reads from a txt file. One of the column in the file is xml.
the file is 15gb and has 40 million records. The job runs for 3 hours.
Is there any way i can improve its performance?
Regards,
Samyam
Posted: Mon Jun 29, 2015 7:10 am
by chulett
Describe your job.
Posted: Tue Jun 30, 2015 2:38 am
by ray.wurlod
5 GB/hour isn't too bad on a small configuration. How many nodes are you using, and what kinds of (how powerful) processors?
Posted: Wed Nov 04, 2015 12:39 pm
by samyamkrishna
Hi All,
Sorry about the delayed response.
We tried a lot of options and the one of them gave us a good performance improvement.
We Split the input file into 4 files of the size 4GB each and triggered the same job 4 times in parallel reading the the 4 different files.
It came down to 1 hour processing time.
Regards,
Samyam
Posted: Wed Nov 04, 2015 3:42 pm
by ray.wurlod
You probably could have done that in one job with four partitions (or eight).