Hi All,
I am using a folder stage to with wild card as *.xml and then using xml input stage to extract the data from xml and then sequential file stage.
Hi kumar,
I had set every thing according to that and it is not able to handle large xml files.
If my file size is 200kb iam not getting the error.
If my file size is 3000kb then it is occuring in the error..
ds_intput() - row too big for inter stage rowbuffer
Have you tried increasing the Row buffer as it is asking for?
May I know what is the use of Transformer next to Folder stage?
Try using IPC stage before XMLINPUT stage, with necessay settings.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Have you tried increasing the Row buffer as it is asking for?
May I know what is the use of Transformer next to Folder stage?
Try using IPC stage before XMLINPUT stage, with necessay settings
*I had increased the row buffer to 256 ,previously it was 128.
*The use transformer is to have constarint @inrownum=1,as i told you iam using a wild card in the folder stage.
Don't let the Folder stage bring in huge files in the second field. Only use the first 'Filename' field in the Folder (delete the second) and then switch the XML Input stage to use the Column content option of URL/File path on the Input tab.
That should allow pretty much any size XML file to be read.
-craig
"You can never have too many knives" -- Logan Nine Fingers
Chullet,
In the folder stage we are using a wild card *.txt and assume it selects 4 files and in the folder stage it self we select the order as descending so that it arranges the files in decending order
example file4,file3,file2,file1....
then we are using a transformer in which we mentioned the constraint as @inrownum=1 so that filename= file4 is come to the out put link of transformer....and then we passed this file4 to xmlinput stage and as u said we had taken Column content option of URL/File path..
But then we are not able to get the content....
Do we need to det any thing els in the xmlinput stage..
Thanks chullet ,
We are using only one field that is filename in the folder stage and the file we are using is xml files.
CONTENT means...
The data in the particular file...as we are using wild card it selects a particular file in the folder and we need to get its content.
How can we acheive this...
As per your last post we had selected only one field in the folderstage that is filename and in the xml input stage we had selected Column content option of URL/File path . but we get only the file name but not the data.
Ok, when 'stuff' goes into the XML Input stage but no 'stuff' comes out, that is typically a problem with your XPath expressions in the stage. That is what drives the parsing of the XMl files.
Best way to get that right is to import the metadata of the file(s) you are trying to process, either directly from the XML or better yet from an .xsd you should have. That process will generate the XPath Expressions for you and then you can import that metadata into your job.
-craig
"You can never have too many knives" -- Logan Nine Fingers
I happened to read this while searching and it struck me that the suggestions are way off the mark. I'm sure you must have solved this by now but thought I would attempt to clarify the problem when others see it.
So here goes;
When XML is being parsed by the XMLInput stage, it must be read in it's entirety. That is, from the root start tag to the root end tag. If link-to-link performance like in process or inter process row buffering is set for the job, then a single XML "row" must fit into the buffer for the XMLInput stage. The buffer max is 9999K. (I think). The buffer default is 126K. Should the XML "row" exceed the buffer size definition the error that started this post will be the result.