Extracting the Data from XML File

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Extracting the Data from XML File

Post by balu536 »

Hi all,
I'm facing the problem while extracting the data from XML file.I've used External Source file stage and XML input stage to extract the data.I'm able to get the datat out of External Source file stage but after the XML stage no data is being fetched.I have loaded the Meta data too.Please explain me the procedure of extracting the data from this XML file which has the structure as mentioned below

Part of XML file structure(rest of the file has same structure too):

<?xml version="1.0" encoding="UTF-8"?>
<CRRDownload:allocatedCRRs xmlns:CRRDownload="http://crr.caiso.org/download/xml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://crr.caiso.org/download/xml http://ftapjbos10:8080/crr/download/xml ... esults.xsd">
<CRRDownload:crr>
<CRRDownload:NominationID>48733</CRRDownload:NominationID>
<CRRDownload:CRRID></CRRDownload:CRRID>
<CRRDownload:Category>PTP</CRRDownload:Category>
<CRRDownload:Portfolio>Allocation_S2_LT</CRRDownload:Portfolio>
<CRRDownload:AssetOwner>1014</CRRDownload:AssetOwner>
<CRRDownload:Source>MALIN_5_RNDMTN</CRRDownload:Source>
<CRRDownload:Sink>LAP_SCE</CRRDownload:Sink>
<CRRDownload:StartDate>2009-04-01</CRRDownload:StartDate>
<CRRDownload:EndDate>2009-06-30</CRRDownload:EndDate>
<CRRDownload:HedgeType>OBL</CRRDownload:HedgeType>
<CRRDownload:CRRType>LSE</CRRDownload:CRRType>
<CRRDownload:TimeOfUse>ON</CRRDownload:TimeOfUse>
<CRRDownload:NominatedMW>6.900</CRRDownload:NominatedMW>
<CRRDownload:AllocatedMW>0.000</CRRDownload:AllocatedMW>
</CRRDownload:crr>
<CRRDownload:crr>
<CRRDownload:NominationID>48734</CRRDownload:NominationID>
<CRRDownload:CRRID></CRRDownload:CRRID>
<CRRDownload:Category>PTP</CRRDownload:Category>
<CRRDownload:Portfolio>Allocation_S2_LT</CRRDownload:Portfolio>
<CRRDownload:AssetOwner>1014</CRRDownload:AssetOwner>
<CRRDownload:Source>SYLMAR_2_NOB</CRRDownload:Source>
<CRRDownload:Sink>LAP_SCE</CRRDownload:Sink>
<CRRDownload:StartDate>2009-04-01</CRRDownload:StartDate>
<CRRDownload:EndDate>2009-06-30</CRRDownload:EndDate>
<CRRDownload:HedgeType>OBL</CRRDownload:HedgeType>
<CRRDownload:CRRType>LSE</CRRDownload:CRRType>
<CRRDownload:TimeOfUse>ON</CRRDownload:TimeOfUse>
<CRRDownload:NominatedMW>174.800</CRRDownload:NominatedMW>
<CRRDownload:AllocatedMW>0.000</CRRDownload:AllocatedMW>
</CRRDownload:crr>
</CRRDownload:allocatedCRRs>


Regards,
Balakrishna
Pagadrai
Participant
Posts: 111
Joined: Fri Dec 31, 2004 1:16 am
Location: Chennai

Re: Extracting the Data from XML File

Post by Pagadrai »

Hi Balakrishna,
Are you seeing any warnings or errors in log?
There could be an issue with the xpath specifed.
For example, for an XML of the form:
<a>
<b>123</b>
<c>456</c>
</a>
I would give the xpath as /a/b and /a/c
let me know if you have any questions.
Pagadrai
Participant
Posts: 111
Joined: Fri Dec 31, 2004 1:16 am
Location: Chennai

Re: Extracting the Data from XML File

Post by Pagadrai »

Hi Balakrishna,
Are you seeing any warnings or errors in log?
There could be an issue with the xpath specifed.
For example, for an XML of the form:
<a>
<b>123</b>
<c>456</c>
</a>
I would give the xpath as /a/b and /a/c
let me know if you have any questions.
balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Re: Extracting the Data from XML File

Post by balu536 »

I have 0 warnings in my director log
Pagadrai
Participant
Posts: 111
Joined: Fri Dec 31, 2004 1:16 am
Location: Chennai

Re: Extracting the Data from XML File

Post by Pagadrai »

balu536 wrote:I have 0 warnings in my director log
Hi,
can you give more details about the column derivation method you are using and what are the keys you have specified?
bachi
Participant
Posts: 28
Joined: Sun May 25, 2008 7:02 am

xml file extraction

Post by bachi »

Hi,
use seq file stage first then take xml input stage.In the seq file take 2 columns(varchar-1000),keep one column as key,select the non key field in thexml input stage,from xml input stage take 2 datasets one of it is reject, enable this option in xml input stage xml file wil be extractd
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Hi...

If you are not getting any errors, but also aren't getting any data, then the issue is mostly likely in your xpath. This is the stuff in the Description property for your link columns.

Follow Pagadral's advice above... start small, and with just one column. We will assume you imported your metadata via the xml metadata importer, but still, choose just one column for inititial testing. Something like:

NominationID (char data type and length) with a Description property of

/CRRDownload:allocatedCRRs/CRRDownload:crr/CRRDownload:NominationID/text()

Because you have namespace prefixes, you will also have to load, on the Transformation Settings tab, the namespace details in the box at the bottom..this is effectively xmlns:CRRDownload="http......." etc. [that's the longest namespace prefix I've ever seen!].

Make that column a key. Let us know if you get some rows.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Post by balu536 »

Hi Ernie,
I'm following the same method that you mentioned.With that only i'm unable to fetch the records.I tried with single and multiple fileds but no change in the outcome

Regards,
Balakrishna
balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Post by balu536 »

On whole i'm narrating the procedure incorporated,

Initially I imported the table definitions(Import-> Table definitions-> XML Table Definitions.I have opened the respective file and later performed Auto check(present in Edit Tab) and selected Nomination ID as the key column and saved the table definition.Later i loaded the name space declarations present under Transformation Settings tab(both under Stage and Output Tags) in XML Input Stage.Later i directly loaded the Columns under output tab by using the Load option.


Regards,
Balakrishna
verify
Premium Member
Premium Member
Posts: 99
Joined: Sun Mar 30, 2008 8:35 am

Post by verify »

When you are reading the xml file using External source stage make sure it should be read as one record(Trim the new lines if at all present in your file).

Use this command in you external source stage:-

cat file_name | tr -d '\n\'

And in your xml input file same element is repeating twice, so properly select the key and enable the repetiotion element required property in your xml input stage.

Hope this helps ..
RK Raju
balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Post by balu536 »

Hi Raju,
The output of External Source stage contains only one record and i'm taking Nomination Id as the key field which has unique values in the entire file.


Regards,
Balakrishna
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hmmm... the External Source stage should be passing the file/pathname to the XML Input stage and the XML stage should be the one "reading" the file. What is yours doing, exactly? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Exactly...you should be sending filenames to XMLInput.....be sure you check the right "XML Content" radio button.
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
bachi
Participant
Posts: 28
Joined: Sun May 25, 2008 7:02 am

Post by bachi »

Hi,
take 2 columns -datarow,keyrow, laod the xml metadata in xml input stage in this u take the normal as per the busines repetion option enable in the xml input stage then run ................
eostic wrote:Hi...

If you are not getting any errors, but also aren't getting any data, then the issue is mostly likely in your xpath. This is the stuff in the Description property for your link columns.

Follow Pagadral's advice above... start small, and with just one column. We will assume you imported your metadata via the xml metadata importer, but still, choose just one column for inititial testing. Something like:

NominationID (char data type and length) with a Description property of

/CRRDownload:allocatedCRRs/CRRDownload:crr/CRRDownload:NominationID/text()

Because you have namespace prefixes, you will also have to load, on the Transformation Settings tab, the namespace details in the box at the bottom..this is effectively xmlns:CRRDownload="http......." etc. [that's the longest namespace prefix I've ever seen!].

Make that column a key. Let us know if you get some rows.

Ernie
balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

Post by balu536 »

The radio button checked is URL/File path.It gets automatically checked when we use External Source File stage.If we use the Sequential FIle stage then it is the XML Document checked by Default.In my case as i've used External Source File stage,URL/File path Radio button is checked


Regards,
Balakrishna
Post Reply