XML input stage usage in parallel job of data stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Nagalakshmi.Krishna
Participant
Posts: 11
Joined: Sat Feb 10, 2007 12:32 am

XML input stage usage in parallel job of data stage

Post by Nagalakshmi.Krishna »

I have a requirment that need to convert XML files data into db2 tables data.This I have to do it in parallel jobs.Is it possible?If yes how can I achieve this.
I actually I have tried this by using sequential stage with data type of LongVarChar(9999), and then XML input stage.But it is not converting the data.I am getting the error like ''XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 5): There are more end tags than start tag''

Could you please help me to do this job?

Thanks
LK
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Don't even bother trying to read it with a Sequential File stage, rather use the External Source stage in a Parallel job as noted here by our XML Guru, Ernie Ostic.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sharantheboss
Participant
Posts: 14
Joined: Mon Mar 23, 2009 12:57 am
Location: INDIA

Re: XML input stage usage in parallel job of data stage

Post by sharantheboss »

Hi,

You can read XML file using sequential file also!! even folder stage can be used. But in the XML Input stage you have to meniton the X-path(hierachy) for each and every output columns and repeative column should be mentioned as key.

Regards
Boss :lol:


Nagalakshmi.Krishna wrote:I have a requirment that need to convert XML files data into db2 tables data.This I have to do it in parallel jobs.Is it possible?If yes how can I achieve this.
I actually I have tried this by using sequential stage with data type of LongVarChar(9999), and then XML input stage.But it is not converting the data.I am getting the error like ''XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 5): There are more end tags than start tag''

Could you please help me to do this job?

Thanks
LK
DS Info
sharantheboss
Participant
Posts: 14
Joined: Mon Mar 23, 2009 12:57 am
Location: INDIA

Re: XML input stage usage in parallel job of data stage

Post by sharantheboss »

Hi,

You can read XML file using sequential file also!! even folder stage can be used. But in the XML Input stage you have to meniton the X-path(hierachy) for each and every output columns and repeative column should be mentioned as key.

Regards
Boss :lol:


Nagalakshmi.Krishna wrote:I have a requirment that need to convert XML files data into db2 tables data.This I have to do it in parallel jobs.Is it possible?If yes how can I achieve this.
I actually I have tried this by using sequential stage with data type of LongVarChar(9999), and then XML input stage.But it is not converting the data.I am getting the error like ''XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 5): There are more end tags than start tag''

Could you please help me to do this job?

Thanks
LK
DS Info
Nagalakshmi.Krishna
Participant
Posts: 11
Joined: Sat Feb 10, 2007 12:32 am

Re: XML input stage usage in parallel job of data stage

Post by Nagalakshmi.Krishna »

Hi,
Folder stage is not available in Parallel jobs,It is available only in Server Jobs.

Thanks,
LK

sharantheboss wrote:Hi,

You can read XML file using sequential file also!! even folder stage can be used. But in the XML Input stage you have to meniton the X-path(hierachy) for each and every output columns and repeative column should be mentioned as key.

Regards
Boss :lol:


Nagalakshmi.Krishna wrote:I have a requirment that need to convert XML files data into db2 tables data.This I have to do it in parallel jobs.Is it possible?If yes how can I achieve this.
I actually I have tried this by using sequential stage with data type of LongVarChar(9999), and then XML input stage.But it is not converting the data.I am getting the error like ''XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 5): There are more end tags than start tag''

Could you please help me to do this job?

Thanks
LK
Nagalakshmi.Krishna
Participant
Posts: 11
Joined: Sat Feb 10, 2007 12:32 am

Thanks

Post by Nagalakshmi.Krishna »

chulett wrote:Don't even bother trying to read it with a Sequential File stage, rather use the External Source stage in a Parallel job as noted [url=http://dsrealtime.wordpress.com/2007/12 ... content-as ...

Thanks a lot Craig,
You made my work so easy and I am able to finish my job.
Thanks once again for your inputs. :)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Excellent! Please mark the post as Resolved using the button at the top of the page.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: XML input stage usage in parallel job of data stage

Post by chulett »

sharantheboss wrote:You can read XML file using sequential file also!! even folder stage can be used.
Dear 'Boss',

As noted, you can only use the Folder stage in a Server job which is why I didn't mention it in this thread but which Ernie did if you bothered to check the linked site. And while you can 'read' an XML file with the Sequential File stage in some cases, more than likely it will just frak it up as happened to the original poster as it feeds unrelated chunks into the job. And if by 'reading' you mean using it to send in the filename in a similar technique to the External Source stage, the latter is still the better choice IHMO. The former way leads to madness. Again, IMHO.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

"frak." Great word. ;)
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So say we all! :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Hi,
I found External Source combined with the XML input stage too slow. It did not process records as fast compared to sequential file. Say for e.g a 1 MB file.

Hence it would be advisable to use sequential files to upload data

I hope you experts agree. If not we can start a new thread on performance of XMLinput stage versus Sequential stage

Regards
Sreeni
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I would expect that to be true (that using Sequential is faster than using External Source Stage), as the document is at least already loaded from disk. Both solutions still have to parse it though, which is the biggest time consumer.

For most scenarios, I consider several things:

a) XML is generally slow. It's strings and parsing, and it's not a screamer. There are "faster" XML solutions and "slower" XML solutions (using SAX type api's vs DOM for example), but if performance is the key criteria, xml is often avoided, even today.

b) XML tends to be small. Where small is <350M or so. Anything bigger and DS can't read it anyway. Really small files become negligible.

c) XML is often unpredictable. This is _the_ primary reason not to use the Sequential Stage. If the XML is fixed and predictable, you'll be ok. But you can easily get a job working, and it's fine for months, until one day an XML document comes along that has a single stray CRLF. As far as XML is concerned, it's just "noise" --- and your DS Job needs to recognize it as such. You can play as much as you want with the Sequential Stage in EE, but it will always have difficulty with "unpredictable" spaces, LF's, and CRLFs. If you can strictly control your XML provider, great! ...but if not, beware.

I always err first on the side of "get the job done in the shortest amount of time with a techinque that is likely to work for all eternity." However, as you stated, there may be times when performance is not a luxury, and risks have to be taken, or control can be enforced elsewhere, as in requiring that a partner deliver xml without any CRLFs (or other offending characters).

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

eostic wrote:I would expect that to be true (that using Sequential is faster than using External Source Stage), as the document is at least already loaded from disk. Both solutions still have to parse it though, which is the biggest time consumer.
I would expect so too, parsing being where most of the time is spent in 'processing' XML. And while the Sequential File stage may be faster when it works, I can't imagine the difference would be significant or of a magnitude such that it would cause one to declare the other mechanism 'too slow'. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
dspradeep
Participant
Posts: 59
Joined: Fri Aug 21, 2009 12:58 am

Post by dspradeep »

I have developed the job for xml source.

seqfile --> xml inputstage --> oracle stage

this job is working fine if xml have data <8KB if i get the XML data value >8KB then i am getting error like xml file contain more than 100000 byte.

please suggest us how to resolve this.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Suggest you actually read the thread you decided to drop into. The first reply waaaaay back up there tells you what you should be doing.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply