Page 1 of 1

Sequential file stage to read the multiple xml files in one

Posted: Sat Aug 28, 2010 1:00 pm
by madhav62
HI,
I am Designed an Parallel job using Sequential File to read XML document.
But in production it supposed to read Multiple XML document i tried file pattern option but wasn't successful in doing so.

work flow is every 6 hours XML document are storied in server location and these ment to be processed in datastage.
File look like this:
META.INVOICEDOC.invoiceServices1-urlmsg01.625.200601011541687341.xml
META.INVOICEDOC.invoiceServices2-urlmsg01.694.200601011541147341.xml

META.INVOICEDOC it common in all the file
invoiceServices may change
nxt three digits are system generated and then followed by dot(.) and timestamp yyyymmdd ending with gain system generated number followed by .XML

Can any one suggest me file pattern for this.

Posted: Sat Aug 28, 2010 2:58 pm
by chulett
What "wasn't successful" about your file pattern attempt? Switch to using an External Source stage as detailed here.

Posted: Sat Aug 28, 2010 7:46 pm
by madhav62
chulett wrote:What "wasn't successful" about your file pattern attempt? Switch to using an External Source stage as detailed here ...
i used ls -l /filelocation/META.INVOICEDOC.* for sequential file.

i tried External source stage but it shows error in args()

Posted: Sat Aug 28, 2010 8:10 pm
by chulett
Well... you can't use the Sequential File stage to reliably read XML files. Follow the link to Ernie's blog and use the ESS to get a list of the filenames only and then set the XML Input stage to do the actual reading of the files. Works way more better. Almost as good as the Folder stage in a Server job. :wink:

Posted: Sat Aug 28, 2010 11:51 pm
by madhav62
Hi Chulett well my concern was that to read the file i need to give the ls command
like: ls #filePath#/Meta.Invoice.*
because the system generates the filenames automatically.
i dont have any problem in reading a single file using sequential or external source stage my concern is how to do it for multiple file.
:!:

Posted: Sun Aug 29, 2010 12:07 am
by chulett
One or multiple files, doesn't make any difference. :?

The details are in the blog I linked you too and they been posted here a bajillion times, search for them if you want to see what others are doing. All you should be delivering to the XML Input stage are the filenames and the stage should be set to the "URL/File path" option. Do that correctly and it will process 1 or 100,000 files no problem.

If you are still having problems, be specific. Tell us exactly how you have everything set up and exactly what your specific error message(s) are, then maybe someone can provide more specific help.

Posted: Sun Aug 29, 2010 12:46 am
by madhav62
design:
sequential stage ---->Xmliput------->dataset
table definitions:
Data longvarchar 9999
Xml schema definition.
sequential file settings:
file pattern #FilePath#/META.INVOICEDOC.*
error: file not found in directory.
and one more thing apart for invoice XML i get one more XML claims.
META.CLAIMDOC.----->xml to the same folder.


***How do i get a list of the filenames only***

Posted: Sun Aug 29, 2010 12:59 am
by chulett
madhav62 wrote:***How do i get a list of the filenames only***
That's in the blog I linked you to. Did you read it? :?

You'll also need to fix your "file pattern" as it doesn't seem to be finding any files.

As to your "one more thing" - you need to have all of the proper metadata / XPath Expressions generated for each of your XML files. Typically one would not do two completely different files at once, but it could be done I imagine if you split to two XML Input stages or know what you are doing xpath-wise but typically there would be two jobs since there are two different sources and (I imagine) two different targets. Or are you somehow planning on merging your invoice and claim data together simultaneously?

Posted: Sun Aug 29, 2010 10:47 am
by madhav62
i used same expression mentioned the article and used external source stage instead of sequential file stage it gave me error saying argument list is too long.
but i worked it around and now its importing 303 files but only file names not the xml data.
when i hit view data its just showing me just the file name not the content of the file.

Posted: Sun Aug 29, 2010 11:08 am
by chulett
That's how it works. All it does it supply the file pathnames to the XML Input stage and that stage does the actual reading - as long as you selected the URL / File path option in the XML Input stage. Did you? What happens when the job runs?

Posted: Sun Aug 29, 2010 11:36 am
by madhav62
Yes its working thank you :)