DSXchange

knowledge · Post by **knowledge** » Tue Sep 04, 2007 1:46 pm

Hi ,
I am very new to xml , my requirement is source XML and target is oracle table , I am getting separate file for every patient report ,I will be processing 75000 at a time .
I created one job :
folder-xml input -seq file ,
I imported meta data from xml file ( I found out from one of the post that i should import it from XSD ) , I have following question :

1:My file has two column as key (Agency no and patient report no ), how can i achive this , as xml input stage only accepts one column as key .
2:i want to process xml file if one of the node has particular value for example e10_01 node has value 10 , 20 , 30 then only process xml file , How can I do this in datastage , do i have to process all 75000 files and then filter files depending on the value of this node or i can do this before processing files in first stage .

3: my xml file has following structure ,
<E05>
<E05_01 xsi:nil="true"/>
<E05_02 xsi:nil="true"/>
</E05>
when I import metadata , i get E05_02nil instead of just node name and value of that is true in flat file , how to get rid of this ?

Thanks in advance ,

roy · Post by **roy** » Wed Sep 05, 2007 8:26 am

Hi,
you can download a XML Best Practices document from Kim Duke's site

IHTH (I Hope This Helps),

knowledge · Post by **knowledge** » Wed Sep 05, 2007 9:36 am

Hi roy,

Thanks ,
I will go thro' this best practices , it looks really helpful ,
if somebody wants to refer this doc , here is the link :

http://www.duke-consulting.com/DataStage_Tips.htm

Thanks ,

eostic · Post by **eostic** » Wed Sep 05, 2007 7:11 pm

1. Don't worry about the "key" (column selected as key)...it doesn't really mean "key". It is merely the way that you indicate which column is the meaningful "repeating element." My advice is to always check the column as key that is in the "lowest level of the current node path that you are interested in retrieving". So... if your file has Agency info, and then Patient reports, and inside of patient reports it has a repeating group of, say, patient visits.....select any one of the patient visit elements as the "repeating element" (in my example, the deepest level might have visit_date, visit_time, comment, diagnosis_code, etc.). For basic retrieval of consistent and simple documents (where all elements are present) it's fairly meaningless, albeit required that you check at least one column. Where it comes into play is when you have documents that leave out this particular element --- then you have to decide if the element is required (and then the Stage will avoid retrieving that entire record) or if you want nulls. If this key indicator is causing you problems on the Oracle side, then put a transformer in between and remove the key check.

2. The best practices document talks about how to exploit XPath for certain values.

3. I'd have to play with that some more, but I'm glad you got what you did. If the importer wasn't really smart about this, it might see "nil" as just any other attribute, and give you nil1 and nil2. I suppose in some respects it could have just given you E05_01, but it's possible that the standard allows an element with xsi:nil to also be mixed, meaning that E05_01 and E05_01nil could be two different values, and thus two separate columns. What's more important is whether it is retreiving data correctly? If it is, change the column name to anything you want. The critical naming is over in the Description property of the grid column name.

Ernie

knowledge · Post by **knowledge** » Wed Sep 05, 2007 7:30 pm

Hi earnie ,
Thanks a lot,
what i did is , I changed one of the xml file as a standard template by replacing all nil expression , just to import meta data , then in the description i added \text() in front of that particular column , thats how i got null in the flat file instead of "true" .

My next requirement is I have to scan 75,000 files but depending on the value of one of the column for ex if node <E10_10> has value 10,20,30 then only i want to process that file else descard it , but i m not sure how to do this ,

pl suggest.
thanks .