File validation method

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

File validation method

Post by TonyInFrance »

I have this assignment where I have as input many kinds of files such as XML, Flat etc. I have to validate these before converting them to flat files in order to integrate the data within into another system.

I have thought about doing this by first converting all the files to text files since validating XML files is something I don't think can be done. Atleast I haven't done such an exercise earlier. I examined the XML Input and XML output stage which convert an XML file to text and vice versa respectively. i think i can use the XML Input for my exercise.

Coming back to the validation process which is the first step I think one integrated validation routine is easier than having one for each type of file. This would be because even after validation, the file would need to be converted to text for integration. So why not convert once and for all to text even if its invalid (by invalid I mean not conforming to specifications with respect to the total number of columns, correct separator etc.) and then validate it using one specific routine?

Anybody think i'm doing something wrong/ unadvisable? Any suggestion would be appreciated.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You can automatically 'validate' XML as long as you have an xsd for them. While the stages support that, I find doing it from the command line a better practice, a more robust solution. As to 'validating' flat files, you'd have to explain what kind of validations you'd need to perform.
-craig

"You can never have too many knives" -- Logan Nine Fingers
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Post by TonyInFrance »

Cheers for that.
As i have never worked with XML files I am lost. Thus I'm not too sure about the command line procedure that you are referring to.

As for validating text files I'd need to validate that each line that I read contains the required number of columns, the required separator etc. Even if one line does not adhere to the laid down requirements I would need to stop further execution and inform the expeditor that the file is defective.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Anyone else there worked with XML? It's pretty daunting at frist to tackle it on your own. There are various validators out there, something Google should turn up. On the UNIX side I've used 'svalidate' from Java Beans and 'Echo10' most recently as it caught problems that the former did not. Both required help from our Java folks to get set up, after that it's just something you script. Not sure what is out there for Windows.

Seems like could also "script" checks of the flat files to do a simple delimiter count using say, awk or perl or the like. Problem would be not counting delimiters inside quoted strings, however. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Pagadrai
Participant
Posts: 111
Joined: Fri Dec 31, 2004 1:16 am
Location: Chennai

Re: File validation method

Post by Pagadrai »

Hi,
I have worked with XML in DataStage, but we havent used the 'validate' option in that.
You can use the Validate XML option available in XML Input stage.
The invalid records can be captured using a reject link.
you can request for XSD from your XML source team or there are lot of tools to generate XSD from XML.
Let us know if you need more details.
TonyInFrance
Premium Member
Premium Member
Posts: 288
Joined: Tue May 27, 2008 3:42 am
Location: Luxembourg

Re: File validation method

Post by TonyInFrance »

Pagadrai wrote:Hi,
I have worked with XML in DataStage, but we havent used the 'validate' option in that.
You can use the Validate XML option available in XML Input stage.
The invalid records can be captured using a reject link.
you can request for XSD from your XML source team or there are lot of tools to generate XSD from XML.
Let us know if you need more details.
Hi,

Thank you very much for your informative post.
@Craig - I have figured out a way to validate a flat file but the XML validation part still remains unsolved. Furthermore my task has been made more complicated.

I'm new to using the XML Input/ Output stages in Datastage.
I understand that I would need an XSD/ DTD file as a comparison base which has to be supplied to the XML stage.

However what I need to know is, can this be parametrized? This is because (contextually speaking) I have around a 100 suppliers sending me files to validate each following their respective XSD/ DTD. I would like to see if there is anyway I can achieve this without having to develop a 100 Datastage jobs, one for each supplier. Thus if 1 job can be run with two inputs (XML file and DTD) that would be ideal. Then if found valid I could use another XML In put stage to convert this file to a flat file.

However this first validation process is important

Regards

S. BASU
Pagadrai
Participant
Posts: 111
Joined: Fri Dec 31, 2004 1:16 am
Location: Chennai

Re: File validation method

Post by Pagadrai »

Hi
Have you got this issue resolved ?

I understood that you have lot of XML input formats to validate and You dont want to create a DS job to validate each.

You can think of a parallel routine to validate XML.
(you can pass the actual XML, validation schema as inputs to it).
based on the return code, you can decide on further processing.

Let me know if it helps.
Post Reply