big XML files vs DS

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ppalka
Participant
Posts: 118
Joined: Thu Feb 10, 2005 7:25 am
Contact:

big XML files vs DS

Post by ppalka »

Would be DS able to process a big XML data? There will be about 20milion records in one input file, each record has 75 columns.

Regards,
Piotrek
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Process as in 'read' or 'create'? :?

Read? It would need to come in via a Folder stage and I sincerely doubt there is any way it would handle a file of that size.

Write? Yes. Hope you're not in any hurry, however. :wink: And since it can't create a file larger than 2GB, you'd need to chunk it up but at least the stage can do that automatically for you.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ppalka
Participant
Posts: 118
Joined: Thu Feb 10, 2005 7:25 am
Contact:

Post by ppalka »

Read using XML Input :)
If there would be difficult in processing such large files, is there any other way to extract data from xml file to flat file? And not writing a basic routine :P
Or maybe the easiest way is to use some external tool to process that file?

Regards,
Piotrek
alanwms
Charter Member
Charter Member
Posts: 28
Joined: Wed Feb 26, 2003 2:51 pm
Location: Atlanta/UK

Post by alanwms »

The XML Input stage would handle the XML parsing into rows and columns, based on the Xpath declarations in the description field. Remember that the stage needs to find all the begin and end tags in the proper order, so make sure you're working with well-formed XML (XMLSpy is a good tool).
ppalka
Participant
Posts: 118
Joined: Thu Feb 10, 2005 7:25 am
Contact:

Post by ppalka »

But the xml input stage implies using a folder stage to read xml file. And I am wonder that it would be a problem to read a huge files by that stage...
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes, as already noted there are definite limits to what can be brought in via the Folder stage. Don't forget that it reads in each file as a record, with the filename going in the first field and the entire contents of the file going into the second field, which is then parsed by the XML Input stage.

Best guess is a limit of a few hundred megabytes. They were built for "real time" processing, so small bite-sized packets of XML is what it was really meant to handle.
-craig

"You can never have too many knives" -- Logan Nine Fingers
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

ppalka,
Can datastage handle huge XML ... probably within it's limits.

The question is SHOULD datastage handle huge XML?

IMHO no. There are better tools to handleXML.

Search here for more info.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
bunu1977
Participant
Posts: 35
Joined: Thu Oct 16, 2003 4:46 am

Post by bunu1977 »

HI,
I have done one projetc where we have read 6 records with 420 columns.
I was able to read the data but it was very slow.

Regds,
Dilip
Dilip Das
bunu1977
Participant
Posts: 35
Joined: Thu Oct 16, 2003 4:46 am

Post by bunu1977 »

HI,
I have done one projetc where we have read 6M records with 420 columns.
I was able to read the data but it was very slow.
Dilip Das
Post Reply