Generic Data validation

abhinavsuri · Post by **abhinavsuri** » Thu Oct 25, 2012 1:55 pm

I am looking for a way to do generic data validation within the datastage job for things like

1. Data type
2. Data length
3. No. of decimal places

We are on v 8.5 and this can easily be done with the Oracle/DB2 connector reject link. However, if we use any other database other than oracle/DB2 the connectors dont allow for such validations. We can still do the validations in the transformer stage using functions such as IsChar() etc but it is not generic. I need to achieve all this functionality using a job which doesn't require any coding if used for multiple sources .

We can also achieve this functionality by writing to a file and running a script outside datastage to do the validations but I want to achieve this solution within datastage.

Any suggestions or guidelines ?

ray.wurlod · Post by **ray.wurlod** » Thu Oct 25, 2012 4:20 pm

If it's in a database the data type, precision and scale will be correct ipso facto.

Otherwise (reading as strings) create a BuildOp.

eph · Post by **eph** » Fri Oct 26, 2012 6:25 am

Hi,

Couldn't that be possible through Column Import + reject link with use of schema files? That should be pretty generic and manage length/type validation?

Eric

ray.wurlod · Post by **ray.wurlod** » Sat Oct 27, 2012 9:03 am

That's too simplistic to my mind. "Validation" will (should) tell you which one is wrong.

abhinavsuri · Post by **abhinavsuri** » Wed Oct 31, 2012 2:50 pm

Thanks for the reply. However, the documentation available for Buildop/custom/wrapped stages is not the greatest. I tried creating a wrapped and custom stage. The wrapped stage is almost there but I am somehow missing something. As far as the custom stage is concerned I am not able to figure out where we can execute a osh command? It doesn't work on the unix command line. Also do we need to create something like a osh class ending with .o in the file name similar to a compiled C++ class.?

boxtoby · Post by **boxtoby** » Thu Nov 01, 2012 6:02 am

Hi there,

You could try something like this: (I've never actually had the chance to implement it....)

It's less sophisticated than using build ops but might be more supportable in the long term.

Source data:
File name InputF44.txt
Key Name Dob Type
1 Fred 01/01/1982 AB45
2 Dave 01/02 1974 TY76
3 John 01/03/1986 LR48 etc etc

Pivot the data in to:
File Name Key Column Name Value
InputF44.txt 1 Name Fred
InputF44.txt 2 Name Dave
InputF44.txt 3 Name John
InputF44.txt 1 Dob 01/01/1982
InputF44.txt 2 Dob 01/01/1974
InputF44.txt 3 Dob 01/01/1986
InputF44.txt 1 Type AB45
InputF44.txt 2 Type TY76
InputF44.txt 3 Type LR48

You will have to fiddle around a bit to get the file name and column headings mixed in with the data.

However, you can now do a lookup to see if the column requires any validation and if it does send the column down a link, or to a job which performs the validation. If it fails you have the file name and they key to point to the offending record.

Hope that helps.

oracledba · Post by **oracledba** » Mon Mar 16, 2015 7:34 am

you can use generic stage. In the generic stage you can run data validations ( transformer has limitations in UI what can be done (cant lookup in a transformer b/c it doesnt allow lookup (
ofcourse lookup can be done with transformer in server jobs but thats besides the point. ) but here you can) you can parameterize at run time so you can use it for multiple sources.