XML output generation

VCInDSX · Post by **VCInDSX** » Tue May 22, 2007 1:35 pm

eostic wrote:....
On the output side, have a column called myNewXML....give it varchar and a long length mostly for doc purposes, with a single "/" in the Description property (also without quotes).

Should work like a charm.

Ernie

Hi Ernie,
As I am not an expert on the XML stages in DS, i would appreciate it if you could help me understand this a bit more.
When the XML doc chunk(s) is passed to the final XML output stage in the above manner, would it be an overhead to the overall processing?
Also, if one were to extrapolate this to a case where millions of records are being output to the file, would that cause any issue?

Thanks in advance for your time,

eostic · Post by **eostic** » Tue May 22, 2007 11:35 pm

Not exactly sure of your question. Creating complex, multi-node documents is not a simple process, nor great on the performance side --- but it can work.... it just requires that you build each "relational" node (that could easily be represented as a normalized table or single set of rows and columns) individually. These are the "chunks" referred to up above. None are complete documents -- just pieces. They you bring them all together at the end. There's no real question about "overhead"...this is the only way to do it if you want to get it done via DataStage and not write anything external.

Ernie

VCInDSX · Post by **VCInDSX** » Wed May 23, 2007 8:54 am

Hi Ernie,
I think that answers my question. I was not sure if there was some other way to generate the document without subjecting the chunks through additional stages. It appears that Datastage job designs/designers have to keep this in mind ( I will, as I continue to work on XML :D ).
Now I can relate it to a conventional DOM object building process, where one has to work from different "fragments" and finally build the document.

Thanks again, for your time and response.

chulett · Post by **chulett** » Wed May 23, 2007 9:17 am

If you go to Kim Duke's website, there is a loverly older Ascential document on XML 'Best Practices' that does a great job of illustrating these techniques. Free for the downloading.

VCInDSX · Post by **VCInDSX** » Wed May 23, 2007 10:23 am

I already grabbed it Craig. It serves well, especially, when I consult it when i run into these issues.
The other question I had on this topic is about the buffer length that Datastage can handle when one tries to bring together huge chunks and assemble them into one final master document. Any benchmarks, guidelines on that.....?
I have a few other queries on the XML output stage, but I will post those in a separate thread

and not hijack this one

Thanks,

eostic · Post by **eostic** » Wed May 23, 2007 10:17 pm

kinda the same as what we've referenced before in several threads....if you let XMLOutput write the content to disk, you can probably go higher than if you send the output further down the job (especially if you are using EE).....but you should be able to get fairly high with Server edition.......avoid validation on output unless you must....

Ernie

VCInDSX · Post by **VCInDSX** » Thu May 24, 2007 11:24 am

Pardon my amateur queries....
If I understand this correctly, if the xml chunks/fragments are persisted to file(s) before they are combined into the final document that should help with performance, correct?
I wonder if that would be additional file I/O time that adds up to the total processing time for the job?

In case of a simple XML generation job,
Stage 1. Read data from 1 or more (if joins) tables. (Yields 2Mil records)
Stage 2. Apply transforms (timestamps, null validations et al)
Stage 3. Write out to XML Output stage (With schema validation)

In this job, Stage 2 waits for completion of Stage 1 and Stage 3 waits for completion of Stage 2.

Does this job stand to gain anything special if it were to be designed as Parallel as opposed to Server?

Thanks again for your time and input,

velagapudi_k · Post by **velagapudi_k** » Wed Jun 27, 2007 2:17 pm

Hi Guys,
Is there a way to get rid of the following when generating XML using datastage XML output stage.

<?xml version="1.0" encoding="UTF-8"?>.

Appreciate your help.

velagapudi_k · Post by **velagapudi_k** » Wed Jun 27, 2007 2:18 pm

Hi Guys,
Is there a way to get rid of the following when generating XML using datastage XML output stage.

<?xml version="1.0" encoding="UTF-8"?>.

Appreciate your help.

velagapudi_k · Post by **velagapudi_k** » Wed Jul 11, 2007 1:00 pm

Absolutely. Just pass it thru "one more" XMLOutput Stage (which is what you'd do if it were all in a single job. You can't just "paste" the xml snippets together without wrapping them in a higher level element(s).

So... have an XMLOutput stage with an input link and an output link.... one column on each.

On the input side, feed it your XML.... and have (minimally) a Description property for that column that is just "/Orgapiload " (without the quotes).

On the output side, have a column called myNewXML....give it varchar and a long length mostly for doc purposes, with a single "/" in the Description property (also without quotes).

Should work like a charm.

Hi ernie, I am trying to do the above mentioned by you.

My input is
<entry org_cd="USSTD" org_level_cd="DIV" name="US Stores Division" locationname="USSTO">
<orgrel org_cd="USSTO" org_level_cd="AREA" eff_date="01/01/1900"/>
<status code="Active" eff_date="01/01/1900"/>
</entry>
<entry org_cd="00010" org_level_cd="REG" name="REGION 10 SOUTHEAST" locationname="USSTD">
<orgrel org_cd="USSTD" org_level_cd="DIV" eff_date="01/01/1900"/>
<status code="Active" eff_date="01/01/1900"/>
</entry>
which I want to wrap a higher level element of orgapiload.
I am using sequential file stage, xml ouput stage and sequential file stage respectively. On the input side, feeding my XML.... and having description property for that column as "/Orgapiload " .
On the output side, I named a column called myXML.... with a single "/" in the Description.

The output I am expecting is
<orgapiload>
<entry org_cd="USSTD" org_level_cd="DIV" name="US Stores Division" locationname="USSTO">
<orgrel org_cd="USSTO" org_level_cd="AREA" eff_date="01/01/1900"/>
<status code="Active" eff_date="01/01/1900"/>
</entry>
<entry org_cd="00010" org_level_cd="REG" name="REGION 10 SOUTHEAST" locationname="USSTD">
<orgrel org_cd="USSTD" org_level_cd="DIV" eff_date="01/01/1900"/>
<status code="Active" eff_date="01/01/1900"/>
</entry>
</orgapiload>

But the output I am getting is

<?xml version="1.0" encoding="UTF-8"?>
<orgapiload>
<entry org_cd="USSTD" org_level_cd="DIV" name="US Stores Division" locationname="USSTO">

</orgapiload>
<orgapiload>
<orgrel org_cd="USSTO" org_level_cd="AREA" eff_date="01/01/1900"/>

</orgapiload>
<orgapiload>
<status code="Active" eff_date="01/01/1900"/>

</orgapiload>
<orgapiload>
</entry>

</orgapiload>
<orgapiload>
<entry org_cd="00010" org_level_cd="REG" name="REGION 10 SOUTHEAST" locationname="USSTD">

</orgapiload>
<orgapiload>
<orgrel org_cd="USSTD" org_level_cd="DIV" eff_date="01/01/1900"/>

</orgapiload>
<orgapiload>
<status code="Active" eff_date="01/01/1900"/>

</orgapiload>
<orgapiload>
</entry>
</orgapiload>

Please help me with this.

eostic · Post by **eostic** » Wed Jul 11, 2007 3:13 pm

There is a check box called "Generate XML Chunk" on one of the output tabs..... check that and it will leave off the header...

Ernie

eostic · Post by **eostic** » Wed Jul 11, 2007 4:22 pm

...this is assuming that your "input" xml chunk defined above has already been collapsed or built as a single column and single row coming into the XMLOutput Stage...

Ernie

eostic · Post by **eostic** » Thu Jul 12, 2007 10:08 am

I just realized that we never discussed VCInDSX's last entry.... thoughts on this interlaced below... [ernie]

Pardon my amateur queries....
If I understand this correctly, if the xml chunks/fragments are persisted to file(s) before they are combined into the final document that should help with performance, correct?

[evo]...typically but that's probably only because lookups make it fairly simple to pick up the chunks "later" in the job....there may be more creative ways to "carry" the xml content forward after it is created....

I wonder if that would be additional file I/O time that adds up to the total processing time for the job?

[evo]. XML to XML is already slow....what's a little more I/O

In case of a simple XML generation job,
Stage 1. Read data from 1 or more (if joins) tables. (Yields 2Mil records)
Stage 2. Apply transforms (timestamps, null validations et al)
Stage 3. Write out to XML Output stage (With schema validation)

In this job, Stage 2 waits for completion of Stage 1 and Stage 3 waits for completion of Stage 2.

[evo]...this is not entirely correct. Saying that Stage 2 "waits" for completion of "Stage 1" implies that the data is staged somewhere before the first row is transformed in Stage 2..... That is not true, as Stage 2 will start performing Transforms immediately upon receiving the first row. Stage 3 is not waiting for all the Transforms either, although it may "appear" that way because XMLOutput is a naturally blocking Stage. But it will be receiving rows continually as they are transformed.

Does this job stand to gain anything special if it were to be designed as Parallel as opposed to Server?

[evo] ...this is more difficult. A Parallel job will typically only be as fast as it's slowest piece. Ultimately, the XMLOutput stage at the end is going to take it's time to create the final document. Depending on the transforms being performed, or the degree of parallelism exploited at the sources, it could be very possible that EE will deliver rows more quickly to the XMLOutput Stage (which would be running sequential), and the framework itself will do a better job getting those rows thru the links....but the XMLOutput Stage may not be able to keep up anyway, and the benefits would be lost..... (for that job anyway --- who knows what else might be going on in the system, or the added flexibility you would get if you were running the Transform on another node, thus freeing up some processes on "this" box for other things, etc.).

Thanks again for your time and input,

_________________
-V

velagapudi_k · Post by **velagapudi_k** » Thu Jul 12, 2007 11:48 am

...this is assuming that your "input" xml chunk defined above has already been collapsed or built as a single column and single row coming into the XMLOutput Stage...

Ernie is there any way what I want to acheive.

eostic · Post by **eostic** » Thu Jul 12, 2007 2:39 pm

Hi....

This whole thing might simply be because you are using the sequential stage to read this xml input. I was assuming that your display below was for readability.....but looking at the output, it appears that you are reading in the xml, with CRLF's and probably getting 8 rows from the sequential file to the XMLOutput Stage. If so, that's the problem. Use the Folder Stage to read in this xml document.... IT MUST BE IN ONE COLUMN....THE ENTIRE CONTENT, whether from this Stage or from another XMLOutput Stage somewhere, before you send it into the XMLOutput Stage for the final wrapper.

This works perfectly as described in the other responses above, with your XML string.

Ernie