xml

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
knowledge
Participant
Posts: 101
Joined: Mon Oct 17, 2005 8:14 am

xml

Post by knowledge »

Hi ,
I am very new to xml , my requirement is source XML and target is oracle table , I am getting separate file for every patient report ,I will be processing 75000 at a time .
I created one job :
folder-xml input -seq file ,
I imported meta data from xml file ( I found out from one of the post that i should import it from XSD ) , I have following question :

1:My file has two column as key (Agency no and patient report no ), how can i achive this , as xml input stage only accepts one column as key .
2:i want to process xml file if one of the node has particular value for example e10_01 node has value 10 , 20 , 30 then only process xml file , How can I do this in datastage , do i have to process all 75000 files and then filter files depending on the value of this node or i can do this before processing files in first stage .

3: my xml file has following structure ,
<E05>
<E05_01 xsi:nil="true"/>
<E05_02 xsi:nil="true"/>
</E05>
when I import metadata , i get E05_02nil instead of just node name and value of that is true in flat file , how to get rid of this ?

Thanks in advance ,
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
you can download a XML Best Practices document from Kim Duke's site


IHTH (I Hope This Helps),
Last edited by roy on Wed Sep 05, 2007 1:23 pm, edited 1 time in total.
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
knowledge
Participant
Posts: 101
Joined: Mon Oct 17, 2005 8:14 am

Post by knowledge »

Hi roy,

Thanks ,
I will go thro' this best practices , it looks really helpful ,
if somebody wants to refer this doc , here is the link :

http://www.duke-consulting.com/DataStage_Tips.htm

Thanks ,
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

1. Don't worry about the "key" (column selected as key)...it doesn't really mean "key". It is merely the way that you indicate which column is the meaningful "repeating element." My advice is to always check the column as key that is in the "lowest level of the current node path that you are interested in retrieving". So... if your file has Agency info, and then Patient reports, and inside of patient reports it has a repeating group of, say, patient visits.....select any one of the patient visit elements as the "repeating element" (in my example, the deepest level might have visit_date, visit_time, comment, diagnosis_code, etc.). For basic retrieval of consistent and simple documents (where all elements are present) it's fairly meaningless, albeit required that you check at least one column. Where it comes into play is when you have documents that leave out this particular element --- then you have to decide if the element is required (and then the Stage will avoid retrieving that entire record) or if you want nulls. If this key indicator is causing you problems on the Oracle side, then put a transformer in between and remove the key check.

2. The best practices document talks about how to exploit XPath for certain values.

3. I'd have to play with that some more, but I'm glad you got what you did. If the importer wasn't really smart about this, it might see "nil" as just any other attribute, and give you nil1 and nil2. I suppose in some respects it could have just given you E05_01, but it's possible that the standard allows an element with xsi:nil to also be mixed, meaning that E05_01 and E05_01nil could be two different values, and thus two separate columns. What's more important is whether it is retreiving data correctly? If it is, change the column name to anything you want. The critical naming is over in the Description property of the grid column name.

Ernie
knowledge
Participant
Posts: 101
Joined: Mon Oct 17, 2005 8:14 am

Post by knowledge »

Hi earnie ,
Thanks a lot,
what i did is , I changed one of the xml file as a standard template by replacing all nil expression , just to import meta data , then in the description i added \text() in front of that particular column , thats how i got null in the flat file instead of "true" .

My next requirement is I have to scan 75,000 files but depending on the value of one of the column for ex if node <E10_10> has value 10,20,30 then only i want to process that file else descard it , but i m not sure how to do this ,

pl suggest.
thanks .
Post Reply