XML Writer - double slash

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
perffi
Premium Member
Premium Member
Posts: 5
Joined: Wed Oct 08, 2003 12:57 am
Location: Finland

XML Writer - double slash

Post by perffi »

Hi!

I've tried to produce an XML-file, but I've met one problem. I have tags which occurs more than once (invoice rows) and I tried to use double slash (Invoice//InvoiceRows/-) in derivation field of XML-Writer. And it almost worked... But tags (which should occur only once) after that double slash repeated as many times as there is rows in the double slash tag! Invoice rows are already in XML-format.

Petri
Paul Preston
Participant
Posts: 24
Joined: Wed Apr 02, 2003 7:09 am
Location: United Kingdom

Post by Paul Preston »

Hi Petri

are you using the XML 2 pack or the standard XML 1 writer?

It would be nice if you could post a section of how yuou want the output to appear. Are there any optional fields that are sometimes missing?

We put most of our data including invoice data ito XML and it works very well so post up an a sample of the output you want and I'll see if I've got something similar working.

Paul.
perffi
Premium Member
Premium Member
Posts: 5
Joined: Wed Oct 08, 2003 12:57 am
Location: Finland

Post by perffi »

I'm using standard XML-writer. The output should look like this:

<Invoice>
<InvoiceNo>000001</InvoiceNo>
<SellerName>Seller</SellerName
<InvoiceRow>
<RowIdentifier>000001</RowIdentifier>
<ArticleName>Product</ArticleName>
</InvoiceRow>
<InvoiceRow>
<RowIdentifier>000002</RowIdentifier>
<ArticleName>Product2</ArticleName>
</InvoiceRow>
<InvoiceRow>
<RowIdentifier>000003</RowIdentifier>
<ArticleName>Product3</ArticleName>
</InvoiceRow>
<BuyerName>Buyer</BuyerName>
<BuyerAddress>Address1</BuyerAddress>
<BuyerPhoneNo>123456</BuyerPhoneNo>
</Invoice>

And my derivation column look like this:

/Invoice/- (control column)
/Invoice/InvoiceNo/#PCDATA
/Invoice/SellerName/#PCDATA
/Invoice//-
/Invoice/BuyerName/#PCDATA
/Invoice/BuyerAddress/#PCDATA
/Invoice/BuyerPhoneNo/#PCDATA

Some columns after invoice rows are optional. As I said in the first post, invoice rows are already in XML-mode and I read them from UV-stage with multi row result set.

Petri
Paul Preston
Participant
Posts: 24
Joined: Wed Apr 02, 2003 7:09 am
Location: United Kingdom

Post by Paul Preston »

Some columns after invoice rows are optional. As I said in the first post, invoice rows are already in XML-mode and I read them from UV-stage with multi row result set.
We also use a UV table with multi row select, and we have similar output. I will check the definition later this afternoon and tell you how we achieved it. (I have a meeting to prepare for right now). You say that the invoice rows are already in XML-mode. Do you mean by this that they already exist with tags as part of the data?
perffi
Premium Member
Premium Member
Posts: 5
Joined: Wed Oct 08, 2003 12:57 am
Location: Finland

Post by perffi »

Yes, they exists with tags. I've tried it without tags, but the result was same. My description was then /Invoice//InvoiceRows/#PCDATA.
Paul Preston
Participant
Posts: 24
Joined: Wed Apr 02, 2003 7:09 am
Location: United Kingdom

Post by Paul Preston »

Petri

firstly a general point about producing XML. We went down the route of using a table join to produce XML tags as part of the data. This was efficient but ran into trouble when certain characters such as & occurred legitimately in our data. Also it was hard to get the indentation correct to make the hierarchy easy to read. Fot this reason we used the XML writer stage to produce all the XML because special characters are handled properly and the XML output is beautifully structured and formatted.

I also notice that your XML is entirely element based. We use a few attributre based components too. It looks more professional.

Please excuse me not completely substituting your data and fields for ours (I will if you find you really can't transpose it to your own case) but I have this old section of job that produces the structure of output I think you want (apart from one or two attribute based fields). I have pasted it as it stands in our job for my convienience. In our case name and address is NandA and is optional and in this example occurs only on the first two invoices.

<Dataload DataloadId="204">
<Document DocumentId="1039725078" State="111111">
<NandA>
<Name>name1</Name>
<Address>address1</Address>
</NandA>
<ItemLine LineNo="1" Type="PA">
<IpPaymentType>T</IpPaymentType>
<IpCode>91151707</IpCode>
<IpQuantity>1.00</IpQuantity>
<IpValue>5.30</IpValue>
</ItemLine>
</Document>
<Document DocumentId="1039725079" State="111111">
<NandA>
<Name>name2</Name>
<Address>address2</Address>
</NandA>
<ItemLine LineNo="1" Type="PA">
<IpPaymentType>T</IpPaymentType>
<IpCode>YWRP</IpCode>
<IpQuantity>2.00</IpQuantity>
<IpValue>9.20</IpValue>
</ItemLine>
<ItemLine LineNo="2" Type="PA">
<IpPaymentType>T</IpPaymentType>
<IpCode>YWRS</IpCode>
<IpQuantity>1.00</IpQuantity>
<IpValue>3.69</IpValue>
</ItemLine>
</Document>
<Document DocumentId="1039725080" State="111111">
<ItemLine LineNo="1" Type="PA">
<IpPaymentType>T</IpPaymentType>
<IpCode>YWRP</IpCode>
<IpQuantity>1.00</IpQuantity>
<IpValue>4.60</IpValue>
</ItemLine>
</Document>
<Document DocumentId="1039725081" State="111111">
<ItemLine LineNo="1" Type="PA">
<IpPaymentType>T</IpPaymentType>
<IpCode>24414112</IpCode>
<IpQuantity>1.00</IpQuantity>
<IpValue>35.20</IpValue>
</ItemLine>
<ItemLine LineNo="2" Type="PA">
<IpPaymentType>T</IpPaymentType>
<IpCode>93156954</IpCode>
<IpQuantity>20.00</IpQuantity>
<IpValue>19.80</IpValue>
</ItemLine>
</Document>
</Dataload>

This output is obtained from this derrivation:

/Dataload/@DataloadId
/Dataload/Document/@DocumentId
/Dataload/Document/@State
/Dataload/Document/NandA/Name/#PCDATA
/Dataload/Document/NandA/Address/#PCDATA
/Dataload/Document/ItemLine/@LineNo
/Dataload/Document/ItemLine/@Type
/Dataload/Document/ItemLine/IpPaymentType/#PCDATA
/Dataload/Document/ItemLine/IpCode/#PCDATA
/Dataload/Document/ItemLine/IpDescription/#PCDATA
/Dataload/Document/ItemLine/IpQuantity/#PCDATA
/Dataload/Document/ItemLine/IpValue/#PCDATA
/Dataload/Document/ItemLine/IpMenuCode/#PCDATA
/Dataload/Document/ItemLine/SpecItemLineType/#PCDATA
/Dataload/Document/ItemLine/IpCommentLines/#PCDATA

Please note that the name address detail is optional in our case and this is placed before the item line detail which is mandatory.

Note also that the trick is for our query results to be accumulated (using a stage variable) in a transformer stage so that we feed a flat line per invoice item (in your case product) row with general invoice detail and address detailed duplicated for each product line in the invoice.

I moved on from this approach and put the address details into a separate XML file so that we could load it into the database in parallel with the invoice lines. This improved through put dramatically and made it all more simple.

Please be aware that if you use the XML reader to read the XML then it does not cope well with missing tags that are there sometimes but not at others.

Hope this helps.
perffi
Premium Member
Premium Member
Posts: 5
Joined: Wed Oct 08, 2003 12:57 am
Location: Finland

Post by perffi »

Now we are going to the solution which I made first, but vice versa as you have done. I put all repeated tags in own XML-files (=hashfile) and load them parallel (lookup) with other tags. In the beginning it was ok, because there was only 6 repeated tags. But now the situation has changed a little bit... I have almost 40 repeated tags! Have you had this kind of situation? I have tried to solve this with one UV-table with columns InvoiceKey, TagName, XMLRow. First two columns are key columns. And when I try to assembly these all, I though that the double slash could helped. But now I quess it won't.

I think, I made a little bit too simple example (I use attributes, too) for my situation in the beginning... I'm sorry for that, because I thought the double slash could solve everything.

Petri
Paul Preston
Participant
Posts: 24
Joined: Wed Apr 02, 2003 7:09 am
Location: United Kingdom

Post by Paul Preston »

Indeed the double slash does not what you hoped.

We do not have tags coming out of our UV tables any more. We let the XML writer generate all the tag information. Tags may repeat several dozen times in our output XML files but we separate invoice rows, invoice addresses and other relationships out to separate XML files and then we load 4 or 6 XML files in parallel.

Parallel loading is efficient because we can use multi processor performance options to get a bit more speed.

Paul.
perffi
Premium Member
Premium Member
Posts: 5
Joined: Wed Oct 08, 2003 12:57 am
Location: Finland

Post by perffi »

Thank you very much, Paul. I appreciate your help and I'm going to think over that parallel loading solution for my system also.

Petri
Post Reply