Page 1 of 1

xml performance

Posted: Wed Sep 14, 2011 5:26 am
by vishal_rastogi
HI All

i am extracting the records from 29 oracle tables and creating the 29 xml file then merging the 29 xml file into one xml file through unix script
then zipping the merged file and doing the sftp through unix script

currently my job is taking around 30-40 sec (depends upon the data in table) to generate and sftp the xml

is there any way by which i can reduce the generation time by 10-15 sec or how i can double the throughput

Posted: Wed Sep 14, 2011 6:15 am
by ray.wurlod
Are you running the 29 jobs consecutively? Running them simultaneously (or in bunches, if you don't have the capacity for all 29) should cut some processing time.

Posted: Wed Sep 14, 2011 6:18 am
by vishal_rastogi
i am runing the jobs in bunches
so for 29 files i have created 5 jobs each containing 6 oracle to xml input stage

Posted: Wed Sep 14, 2011 7:21 am
by chulett
Convert them to Server jobs. :wink:

(in the immortal words of BOC - don't fear the Server)

Posted: Wed Sep 14, 2011 7:47 am
by eostic
Absolutely. If they are that simple, chances are they run in less than 1 second, and the overhead is just EE Job Start up time.

Ernie

Posted: Thu Sep 15, 2011 1:11 am
by vishal_rastogi
thanks for your inputs
jsut wanted to know is ther enay way to convert the paralllel jobs into the server jobs
and i understood your logic that i am not using paralleism and pieline concept so better to go with server jobs.

Posted: Thu Sep 15, 2011 2:01 am
by ray.wurlod
Curiously you can use both pipeline and partition parallelism in server jobs but, in your case (with small volumes) you don't need to.

Posted: Thu Sep 15, 2011 6:26 am
by eostic
No way to directly convert them, but the methodology is similar, so you shouldn't have too much difficulty, and the syntax of the xmlStage is identical.

Save the output link definition for your xmlInput stage in EE to a tabledef and you can then just "load" that into the output link of your server xmlInput Stage.

You'll still need to manually re-apply various properties in other parts of the Stage and Job.

Ernie

Posted: Fri Sep 23, 2011 9:00 am
by vishal_rastogi
just want to know if i will create a multiple instances of the job will it going to improve the performance

Posted: Fri Sep 23, 2011 12:04 pm
by eostic
Multiple Instances don't really apply here. There are lots of things you can do with multiple instances, one of them being a 'convenience' ...to have one job design, and run it (say) 15 times concurrently, passing different job parameters to each.

Based on what we've been discussing, going to Server Jobs is going to get you a dramatic improvement for these tiny files...mostly because the processing time for parsing the information is probably not where your bottleneck is.

Ernie

Posted: Fri Sep 23, 2011 1:57 pm
by FranklinE
eostic wrote:...mostly because the processing time for parsing the information is probably not where your bottleneck is.

Ernie
I'm about to post an XML performance problem, and it looks like parsing is where our bottleneck is.

Posted: Fri Sep 23, 2011 2:36 pm
by eostic
Is this still the original thread?...the original discussed writing xml, so it's hard to tell what is being discussed...

Posted: Fri Sep 23, 2011 4:02 pm
by chulett
Franklin has his own thread now, let's not muck this one up with his stuff. :wink:

Posted: Fri Sep 23, 2011 4:07 pm
by FranklinE
chulett wrote:Franklin has his own thread now, let's not muck this one up with his stuff. :wink:
Without muck, there can be no muckraking. :P