XML large files (to be or not to be)
Moderators: chulett, rschirm, roy
XML large files (to be or not to be)
Hi all;
after trying to deal with large XML files 2 gig and plus. I realised that I am going to use Java/C++ instead of DS.
tell you why:
1- Memory limitations and leakage
2- very slow
3- or can't even process the file successfuly
yet, I could not find any confirmation form Ascential that large XML files is not in DS capabilities
but still I could not think of any tricks to pass over this situation.
so it will not be DS, it will be Java/c++.... Any comments!
after trying to deal with large XML files 2 gig and plus. I realised that I am going to use Java/C++ instead of DS.
tell you why:
1- Memory limitations and leakage
2- very slow
3- or can't even process the file successfuly
yet, I could not find any confirmation form Ascential that large XML files is not in DS capabilities
but still I could not think of any tricks to pass over this situation.
so it will not be DS, it will be Java/c++.... Any comments!
If you specifically mean processing large XML files with the XML (or Folder) stages in DataStage, then there is an acknowledged problem with processing 'large' files. And from what I understand, large means more like a couple of hundred megabytes or so... 2 gigs would be huge and well beyond the capabilities of the stages.
That being said, I don't think this limitation is documented. However, there has been a number of posts here and on ADN on the subject. From what I recall, there is even a semi-official Ascential answer over on ADN acknowledging the issue.
You might want to post this over there and see what comes of it.
That being said, I don't think this limitation is documented. However, there has been a number of posts here and on ADN on the subject. From what I recall, there is even a semi-official Ascential answer over on ADN acknowledging the issue.
You might want to post this over there and see what comes of it.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
I've processed standard flat files > 2gb with out problem on a Solaris DS 7.0.1 system.
As for the XML: we hit the same problem and decided to go with saxon java XSL. We wrote some xsl scripts and bingo, instant extract to pipe delimited files (well a few minutes ). the extract was much faster than what data stage was doing and we HAMMER the system with 15 extracts against 3 XML files going at once.
Let me know if you need more details. We are running under JAVA 1.2, bt I want to upgrade to 1.4+ as it has better memory management (the main problem).
You can also go with the C or java verision of xerces from apache. The C is a little faster, but I found more robust.
Both of these software are free.
Now if only the osurce would stop sending through CNTL-V and CNTL-K in the CDATA fields (both illegal) I'd be much happier.
Andrew
As for the XML: we hit the same problem and decided to go with saxon java XSL. We wrote some xsl scripts and bingo, instant extract to pipe delimited files (well a few minutes ). the extract was much faster than what data stage was doing and we HAMMER the system with 15 extracts against 3 XML files going at once.
Let me know if you need more details. We are running under JAVA 1.2, bt I want to upgrade to 1.4+ as it has better memory management (the main problem).
You can also go with the C or java verision of xerces from apache. The C is a little faster, but I found more robust.
Both of these software are free.
Now if only the osurce would stop sending through CNTL-V and CNTL-K in the CDATA fields (both illegal) I'd be much happier.
Andrew
Thanks Andrew.
What I'm trying to do is read data from an Oracle database (9.2) and produce one XML file which will be measured in tens of gigabytes. I know that DataStage uses the Xalan XSLT processor for reading and transforming but I don't know anything about the way it creates XML documents. Have you tried to create very large files using DataStage XML pack 2. If so, any comments would be appreciated.
What I'm trying to do is read data from an Oracle database (9.2) and produce one XML file which will be measured in tens of gigabytes. I know that DataStage uses the Xalan XSLT processor for reading and transforming but I don't know anything about the way it creates XML documents. Have you tried to create very large files using DataStage XML pack 2. If so, any comments would be appreciated.
Jim Paradies
Jim, you're hijacking this thread away from the original poster. But on the note of your query, you're probably not using the right approach for your volume. You're probably dealing with hundreds of millions of rows of data, so choking that thru a Server job (can't tell your OS, release, etc because it's not your thread ) is probably not scalable. You're going to have to high-performance spool (ie NOT A SERVER JOB) the output and then convert to XML. For low volumes on a decent machine DS will be alright, but for high volumes you're going to need to get the volume data out of Oracle and that requires multiple output streams equivalent to what your system can handle (probably, an output stream per partition and not more than 2 streams per cpu). You're going to have to investigate scripted alternatives.jzparad wrote:Thanks Andrew.
What I'm trying to do is read data from an Oracle database (9.2) and produce one XML file which will be measured in tens of gigabytes. I know that DataStage uses the Xalan XSLT processor for reading and transforming but I don't know anything about the way it creates XML documents. Have you tried to create very large files using DataStage XML pack 2. If so, any comments would be appreciated.
Of course if you're paid by the hour, and the customer doesn't mind waiting, get all you can.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
aartlett wrote: You can also go with the C or java verision of xerces from apache. The C is a little faster, but I found more robust.
Both of these software are free.
Hi Andrew,
I went to http://www.saxonica.com/ and I found that Saxon XSLT is not free product! am I going to the right page or the free software mentioned above was not including Saxon XSLT?
Another question please,
Can you give me rough estimating how many records/lines produced per seconds by these softwares? I 've hit the number 18000+ per second
Personally, as an Ascential's "HAPPY" customer I've opened a new case "ticket" since 10th january and No single response coming to my mail before writing this message!!!!!!!!!!!!!!!!!!!!!!!jzparad wrote:Can anyone confirm that there is a 2G limitation on files read by DataStage server?
Is this only for the XML stage?
The initial post on this topic seemed to imply that it was possible but slow.
<------------------><-------------------->
PS: despite the sick and slow Ascential's web site, many information is not up-to-date !!!!