Reading XML file in windows through external source

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Maximus_Jack
Premium Member
Premium Member
Posts: 139
Joined: Fri Apr 11, 2008 1:02 pm

Reading XML file in windows through external source

Post by Maximus_Jack »

Hi All
I'm trying to read a XML file through a external source, but i dont know how to pass the file path alone to the XML input stage, i tried "cat" and echo commands but not getting file path to the XML input stage, can you please help me on this, thanks


External source > XML input stage > peek stage

cheers
MJ
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Courtesy of Ernie Ostic, our resident XML expert:

http://dsrealtime.wordpress.com/2007/12 ... -a-source/
-craig

"You can never have too many knives" -- Logan Nine Fingers
Maximus_Jack
Premium Member
Premium Member
Posts: 139
Joined: Fri Apr 11, 2008 1:02 pm

Post by Maximus_Jack »

Hi
I tried the option given for the external source
" ls C:\Datastage\Testinputfiles\XML_in_Books.xml | sort " ( without quotes)

i'm getting the error as
"Invalid data source specified: Invalid hostname: ls C."

and i did some googling for that and i ended up in the below link, but i couldnt get anything specific

http://www-304.ibm.com/support/docview. ... wg21449676

cheers
MJ
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why not use command line options to effect the sort? For example ls -1rt
Also, are you putting this in the correct field in the External Source stage? Why is DataStage complaining about hostname? Is it expecting a UNC pathname? Try enclosing the pathname in double quote characters.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Maximus_Jack
Premium Member
Premium Member
Posts: 139
Joined: Fri Apr 11, 2008 1:02 pm

Post by Maximus_Jack »

Hi
Thanks for responding

FYI -- I'm in windows environment.

I tried your suggestions, but still getting the same error
" Invalid data source specified: Invalid hostname: "ls C"."

I have configured the external source as below
Source Method: Specific Programs
Source Program: "ls C:\Datastage\Testinputfiles\XML_in_Books.xml | sort"

why datastage is complaining about invalid hostname?
pls see the below link, it seems like a bug, but i'm not sure
http://www-304.ibm.com/support/docview. ... wg21449676

your response is highly appreicated

cheers
MJ
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

From the link it doesn't sound like a bug but rather behaviour specific to the presence of a colon in your 'prog name'. Have you tried what it suggested to resolve the issue?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Maximus_Jack
Premium Member
Premium Member
Posts: 139
Joined: Fri Apr 11, 2008 1:02 pm

Post by Maximus_Jack »

Hi Chulet
pls Correct me if my understanding is wrong

The colon ":" that is specified in the link is the windows path name, which is
the second character after "C"
C:\Datastage\Testinputfiles\XML_in_Books.xml

I'm not sure about how i can give the "*" wild character in the above link, because there are other few XML files in the same directory and i want this XML file to be read, so can you pls help me in understanding how i can give this "*" character

cheers
MJ
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try replacing the colon with an equals sign.

Assuming your currently logged drive is C:, try using a UNIX-style pathname, for example /Datastage/Testinputfiles/XML_in_Books.xml (MKS Toolkit will treat the first / as "C:\".)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Maximus_Jack
Premium Member
Premium Member
Posts: 139
Joined: Fri Apr 11, 2008 1:02 pm

Post by Maximus_Jack »

Hi
Thanks for responding

I tried replacing colon with equal sign, like
"C=/Datastage/Testinputfiles/XML_in_Books.xml"
but getting the error as
Datastage error
=========
Source subproc: ls: File or directory "C=/Datastage/Testinputfiles/XML_in_Books.xml" is not found


and i tried this as well
ls /Datastage/Testinputfiles/XML_in_Books.xml

and i'm getting the below error, i dont know where it's picking the directory
"C:\IBM\InformationServer\Server\Projects\Training//"

and i have one doubt, the slash is in unix style, will that work??

and FYI, the file i'm trying to read is in
C=\Datastage\Testinputfiles\XML_in_Books.xml

Datastage error
==========
The primary document entity could not be opened. Id=C:\IBM\InformationServer\Server\Projects\Training//Datastage/Testinputfiles/XML_in_Books.xml

cheers
MJ
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I haven't tried it in windows in a long time, but memory tells me that it just assumed C: ....pretend that it is Unix, leave out the C: entirely, switch your slashes the other way, and try ls /DataStage/.../.../.../

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Maximus_Jack
Premium Member
Premium Member
Posts: 139
Joined: Fri Apr 11, 2008 1:02 pm

Post by Maximus_Jack »

Thanks for your response ernie...

i have tried your suggestion (ls \Datastage\Testinputfiles\XML_in_Books.xml), but no luck...i'm still getting the same error

Datastage error
============
The primary document entity could not be opened. Id=C:\IBM\InformationServer\Server\Projects\Training//Datastage/Testinputfiles/XML_in_Books.xml

But one thing i'm not sure, i dont know from where the below path "C:\IBM\InformationServer\Server\Projects\Training/" is getting picked from,
if i could change this path to just "C:", i think i might have a chance, any idea please, thanks


Once again thanks to all for responding
cheers
MJ
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It is turning it into a relative path from your 'current working directory' which is the Project the job is in. He did suggest you put your slashes the other way, UNIX style rather than DOS, so that it looks more like an absolute path. See if that change helps at all.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Maximus_Jack
Premium Member
Premium Member
Posts: 139
Joined: Fri Apr 11, 2008 1:02 pm

Post by Maximus_Jack »

believe me...
In the external source stage i put the slash in the windows style
(ls \Datastage\Testinputfiles\XML_in_Books.xml) ,
but i dont know why its getting converted to a unix style... :( .......
i even tried typing in the command prompt, i'm getting in the unix style only,
probably ls converts it into unix style.....

then i tried echo command in the command prompt, it works fine in the command prompt, but when i used it in the job, its again getting converted
to back slash.....

any idea why datasage converts a forward slash into a backward slash....
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

It's possible that we are chasing the wrong problem.....the directory you see in the error is what the xml stage produces.....

First, re-write your Job. Delete the xmlInput Stage and just have a Sequential Stage.

Make sure you are getting a proper list of files from that subdirectory in your External Source Stage.

I have used this for years in my test Windows images:

ls /tmp/xmlBasicEnablement/*.xml

Works perfectly.

Validate first that you are sending a correct list into the Stage...if you are, then things are good in your External Source Stage....and we need to focus elsewhere. Perhaps there are other access issues regarding the xml document(s) in question, or that there is something else going on.

If there is a schemaLocation attribute in your test document, delete it for now.

Also, exactlly what release are you on? Namespaces, URL issues, etc. were common in 8.1.x without a variety of patches.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Maximus_Jack
Premium Member
Premium Member
Posts: 139
Joined: Fri Apr 11, 2008 1:02 pm

Post by Maximus_Jack »

Hi Erine
Yes, the problem is with the external stage, i created a new job and tried
connecting a external source with a sequential file, in the source program
i mentioned as "ls \Datastage\Testinputfiles\XML_in_Books.xml" (without quotes), but in the sequential file stage i'm getting as
"/Datastage/Testinputfiles/XML_in_Books.xml" (without quotes)

I dont know how a backward slash (\) is getting converted to a forward slash(/)


datastage version 8.1


cheers
MJ
Post Reply