Generic Data validation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
abhinavsuri
Premium Member
Premium Member
Posts: 62
Joined: Thu Dec 28, 2006 11:54 pm

Generic Data validation

Post by abhinavsuri »

I am looking for a way to do generic data validation within the datastage job for things like

1. Data type
2. Data length
3. No. of decimal places

We are on v 8.5 and this can easily be done with the Oracle/DB2 connector reject link. However, if we use any other database other than oracle/DB2 the connectors dont allow for such validations. We can still do the validations in the transformer stage using functions such as IsChar() etc but it is not generic. I need to achieve all this functionality using a job which doesn't require any coding if used for multiple sources .

We can also achieve this functionality by writing to a file and running a script outside datastage to do the validations but I want to achieve this solution within datastage.

Any suggestions or guidelines ?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If it's in a database the data type, precision and scale will be correct ipso facto.

Otherwise (reading as strings) create a BuildOp.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eph
Premium Member
Premium Member
Posts: 110
Joined: Mon Oct 18, 2010 10:25 am

Post by eph »

Hi,

Couldn't that be possible through Column Import + reject link with use of schema files? That should be pretty generic and manage length/type validation?

Eric
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's too simplistic to my mind. "Validation" will (should) tell you which one is wrong.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abhinavsuri
Premium Member
Premium Member
Posts: 62
Joined: Thu Dec 28, 2006 11:54 pm

Post by abhinavsuri »

Thanks for the reply. However, the documentation available for Buildop/custom/wrapped stages is not the greatest. I tried creating a wrapped and custom stage. The wrapped stage is almost there but I am somehow missing something. As far as the custom stage is concerned I am not able to figure out where we can execute a osh command? It doesn't work on the unix command line. Also do we need to create something like a osh class ending with .o in the file name similar to a compiled C++ class.?
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Hi there,

You could try something like this: (I've never actually had the chance to implement it....)

It's less sophisticated than using build ops but might be more supportable in the long term.

Source data:
File name InputF44.txt
Key Name Dob Type
1 Fred 01/01/1982 AB45
2 Dave 01/02 1974 TY76
3 John 01/03/1986 LR48 etc etc

Pivot the data in to:
File Name Key Column Name Value
InputF44.txt 1 Name Fred
InputF44.txt 2 Name Dave
InputF44.txt 3 Name John
InputF44.txt 1 Dob 01/01/1982
InputF44.txt 2 Dob 01/01/1974
InputF44.txt 3 Dob 01/01/1986
InputF44.txt 1 Type AB45
InputF44.txt 2 Type TY76
InputF44.txt 3 Type LR48

You will have to fiddle around a bit to get the file name and column headings mixed in with the data.

However, you can now do a lookup to see if the column requires any validation and if it does send the column down a link, or to a job which performs the validation. If it fails you have the file name and they key to point to the offending record.

Hope that helps.
Bob Oxtoby
oracledba
Premium Member
Premium Member
Posts: 49
Joined: Mon Aug 06, 2012 9:21 am

Post by oracledba »

you can use generic stage. In the generic stage you can run data validations ( transformer has limitations in UI what can be done (cant lookup in a transformer b/c it doesnt allow lookup (
ofcourse lookup can be done with transformer in server jobs but thats besides the point. ) but here you can) you can parameterize at run time so you can use it for multiple sources.
Post Reply