Page 1 of 1

Rejecting non-numeric fields

Posted: Thu Sep 08, 2005 2:31 pm
by ccatania
Used Constraint REJECTEDCODE <> DSE.NOERROR the results was the first rec was written to the reject_tbl. The RECECTEDCODE Constraint is in 2nd output link, testing link 1 for non-numeric data.
Using REJECTEDCODE = DSE.NOERROR all records written to both tables. Can not find any documentation to assist in this. any advice is greatly appreciated.

Posted: Thu Sep 08, 2005 4:44 pm
by ray.wurlod
Welcome aboard! :D

Read more about link variables; you may find it more appropriate to use LinkName1.REJECTED to determine whether a row was rejected by whatever the Put() function is on output link LinkName1.

But, to reject non-numeric input fields, an easier constraint expression would be

Code: Select all

Not(Num(InputLink.ColumnName))
on the main output link. In this way, there is not even any attempt to send them out along this link.

Posted: Fri Sep 09, 2005 5:28 am
by ccatania
Thanks for the Welcome and the advice. :D

I did use the num constraint at first and it did work, as long as you know what coulmn is suspect. My problem is that there can be dirty data in any numeric coulmn for any of the tables that we are loading from our Legacy system, no way to know ahead of time. I'm trying to set a trap that will reject any coulmn.

Posted: Fri Sep 09, 2005 7:09 am
by dls
In general, adding edits to the source of record (i.e., the legacy system) is the preferred solution.

Then, each downstream process (e.g, ETL) doesn't have to perform any further edits.

Making that happen (and maintaining it) can be another story, though. :wink:

Posted: Fri Sep 09, 2005 7:19 am
by ccatania
What a wonderful world it would be if it was that simple. Unfortunately that is not even a remote possiblility. Thanks for the reply. :wink:

Posted: Fri Sep 09, 2005 8:11 am
by pnchowdary
Hi ccatania,

If you dont know ahead of time which field you want to test for non-numeric, then apply the Num test to all the columns that shouldn't have any non-numeric values

Code: Select all

Output link

Num(InputLink.ColumnName1) And Num(InputLink.ColumnName2) 
and in the

Code: Select all

Reject link

Not(Num(InputLink.ColumnName1)) Or Not(Num(InputLink.ColumnName2)) 
Alternatively

You can declare a stage variable which you can populate with a flag of 'Y' or 'N' based on Num test of all your numeric fields and then use these flag values as constraints for the output link and the reject link.

Posted: Fri Sep 09, 2005 5:49 pm
by ray.wurlod
My problem is that there can be dirty data in any numeric coulmn for any of the tables that we are loading from our Legacy system, no way to know ahead of time. I'm trying to set a trap that will reject any coulmn.
<paranoia>If there's a chance it will be dirty you have to check.</paranoia>

One of the fundamental requirements of a data warehouse is that it contains good data. One of the fundamental tenets of good ETL practice, therefore, is to deliver data that are guaranteed to be good, whatever that means in the particular business context.

There is no convenient mechanism within DataStage to construct a global test. One way would be to construct a routine to which you pass a list of all the numeric columns' values, and which tests that each is numeric returning a true/false result. Or separate stage variables. But, whatever you do, you will at some point need to process each column value, even if just to build the list.

Posted: Mon Sep 12, 2005 5:43 am
by ccatania
Thanks Ray,

I think I already knew this, but I was being optimistic at the least that there may be a global solution to this problem. You are 100% correct the DW has to be clean unfortunately they did not take that in consideration when they built the Legacy system I am extracting this data from. Building a pre-Transform process, which I'll dub the 'Cleaning Cycle', will definitely be required. I want to thank you again for you assistance as well as all others members who have replied. :)