I am looking for a way to do generic data validation within the datastage job for things like
1. Data type
2. Data length
3. No. of decimal places
We are on v 8.5 and this can easily be done with the Oracle/DB2 connector reject link. However, if we use any other database other than oracle/DB2 the connectors dont allow for such validations. We can still do the validations in the transformer stage using functions such as IsChar() etc but it is not generic. I need to achieve all this functionality using a job which doesn't require any coding if used for multiple sources .
We can also achieve this functionality by writing to a file and running a script outside datastage to do the validations but I want to achieve this solution within datastage.
Any suggestions or guidelines ?
Generic Data validation
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 62
- Joined: Thu Dec 28, 2006 11:54 pm
Thanks for the reply. However, the documentation available for Buildop/custom/wrapped stages is not the greatest. I tried creating a wrapped and custom stage. The wrapped stage is almost there but I am somehow missing something. As far as the custom stage is concerned I am not able to figure out where we can execute a osh command? It doesn't work on the unix command line. Also do we need to create something like a osh class ending with .o in the file name similar to a compiled C++ class.?
Hi there,
You could try something like this: (I've never actually had the chance to implement it....)
It's less sophisticated than using build ops but might be more supportable in the long term.
Source data:
File name InputF44.txt
Key Name Dob Type
1 Fred 01/01/1982 AB45
2 Dave 01/02 1974 TY76
3 John 01/03/1986 LR48 etc etc
Pivot the data in to:
File Name Key Column Name Value
InputF44.txt 1 Name Fred
InputF44.txt 2 Name Dave
InputF44.txt 3 Name John
InputF44.txt 1 Dob 01/01/1982
InputF44.txt 2 Dob 01/01/1974
InputF44.txt 3 Dob 01/01/1986
InputF44.txt 1 Type AB45
InputF44.txt 2 Type TY76
InputF44.txt 3 Type LR48
You will have to fiddle around a bit to get the file name and column headings mixed in with the data.
However, you can now do a lookup to see if the column requires any validation and if it does send the column down a link, or to a job which performs the validation. If it fails you have the file name and they key to point to the offending record.
Hope that helps.
You could try something like this: (I've never actually had the chance to implement it....)
It's less sophisticated than using build ops but might be more supportable in the long term.
Source data:
File name InputF44.txt
Key Name Dob Type
1 Fred 01/01/1982 AB45
2 Dave 01/02 1974 TY76
3 John 01/03/1986 LR48 etc etc
Pivot the data in to:
File Name Key Column Name Value
InputF44.txt 1 Name Fred
InputF44.txt 2 Name Dave
InputF44.txt 3 Name John
InputF44.txt 1 Dob 01/01/1982
InputF44.txt 2 Dob 01/01/1974
InputF44.txt 3 Dob 01/01/1986
InputF44.txt 1 Type AB45
InputF44.txt 2 Type TY76
InputF44.txt 3 Type LR48
You will have to fiddle around a bit to get the file name and column headings mixed in with the data.
However, you can now do a lookup to see if the column requires any validation and if it does send the column down a link, or to a job which performs the validation. If it fails you have the file name and they key to point to the offending record.
Hope that helps.
Bob Oxtoby
you can use generic stage. In the generic stage you can run data validations ( transformer has limitations in UI what can be done (cant lookup in a transformer b/c it doesnt allow lookup (
ofcourse lookup can be done with transformer in server jobs but thats besides the point. ) but here you can) you can parameterize at run time so you can use it for multiple sources.
ofcourse lookup can be done with transformer in server jobs but thats besides the point. ) but here you can) you can parameterize at run time so you can use it for multiple sources.