I just realized that we never discussed VCInDSX's last entry.... thoughts on this interlaced below... [ernie]
Pardon my amateur queries....
If I understand this correctly, if the xml chunks/fragments are persisted to file(s) before they are combined into the final document that should help with performance, correct?
[evo]...typically but that's probably only because lookups make it fairly simple to pick up the chunks "later" in the job....there may be more creative ways to "carry" the xml content forward after it is created....
I wonder if that would be additional file I/O time that adds up to the total processing time for the job?
[evo]. XML to XML is already slow....what's a little more I/O
In case of a simple XML generation job,
Stage 1. Read data from 1 or more (if joins) tables. (Yields 2Mil records)
Stage 2. Apply transforms (timestamps, null validations et al)
Stage 3. Write out to XML Output stage (With schema validation)
In this job, Stage 2 waits for completion of Stage 1 and Stage 3 waits for completion of Stage 2.
[evo]...this is not entirely correct. Saying that Stage 2 "waits" for completion of "Stage 1" implies that the data is staged somewhere before the first row is transformed in Stage 2..... That is not true, as Stage 2 will start performing Transforms immediately upon receiving the first row. Stage 3 is not waiting for all the Transforms either, although it may "appear" that way because XMLOutput is a naturally blocking Stage. But it will be receiving rows continually as they are transformed.
Does this job stand to gain anything special if it were to be designed as Parallel as opposed to Server?
[evo] ...this is more difficult. A Parallel job will typically only be as fast as it's slowest piece. Ultimately, the XMLOutput stage at the end is going to take it's time to create the final document. Depending on the transforms being performed, or the degree of parallelism exploited at the sources, it could be very possible that EE will deliver rows more quickly to the XMLOutput Stage (which would be running sequential), and the framework itself will do a better job getting those rows thru the links....but the XMLOutput Stage may not be able to keep up anyway, and the benefits would be lost..... (for that job anyway --- who knows what else might be going on in the system, or the added flexibility you would get if you were running the Transform on another node, thus freeing up some processes on "this" box for other things, etc.).
Thanks again for your time and input,
_________________
-V