Rejecting non-numeric fields

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ccatania
Premium Member
Premium Member
Posts: 68
Joined: Thu Sep 08, 2005 5:42 am
Location: Raleigh
Contact:

Rejecting non-numeric fields

Post by ccatania »

Used Constraint REJECTEDCODE <> DSE.NOERROR the results was the first rec was written to the reject_tbl. The RECECTEDCODE Constraint is in 2nd output link, testing link 1 for non-numeric data.
Using REJECTEDCODE = DSE.NOERROR all records written to both tables. Can not find any documentation to assist in this. any advice is greatly appreciated.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard! :D

Read more about link variables; you may find it more appropriate to use LinkName1.REJECTED to determine whether a row was rejected by whatever the Put() function is on output link LinkName1.

But, to reject non-numeric input fields, an easier constraint expression would be

Code: Select all

Not(Num(InputLink.ColumnName))
on the main output link. In this way, there is not even any attempt to send them out along this link.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ccatania
Premium Member
Premium Member
Posts: 68
Joined: Thu Sep 08, 2005 5:42 am
Location: Raleigh
Contact:

Post by ccatania »

Thanks for the Welcome and the advice. :D

I did use the num constraint at first and it did work, as long as you know what coulmn is suspect. My problem is that there can be dirty data in any numeric coulmn for any of the tables that we are loading from our Legacy system, no way to know ahead of time. I'm trying to set a trap that will reject any coulmn.
Charlie
dls
Premium Member
Premium Member
Posts: 96
Joined: Tue Sep 09, 2003 5:15 pm

Post by dls »

In general, adding edits to the source of record (i.e., the legacy system) is the preferred solution.

Then, each downstream process (e.g, ETL) doesn't have to perform any further edits.

Making that happen (and maintaining it) can be another story, though. :wink:
ccatania
Premium Member
Premium Member
Posts: 68
Joined: Thu Sep 08, 2005 5:42 am
Location: Raleigh
Contact:

Post by ccatania »

What a wonderful world it would be if it was that simple. Unfortunately that is not even a remote possiblility. Thanks for the reply. :wink:
Charlie
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

Hi ccatania,

If you dont know ahead of time which field you want to test for non-numeric, then apply the Num test to all the columns that shouldn't have any non-numeric values

Code: Select all

Output link

Num(InputLink.ColumnName1) And Num(InputLink.ColumnName2) 
and in the

Code: Select all

Reject link

Not(Num(InputLink.ColumnName1)) Or Not(Num(InputLink.ColumnName2)) 
Alternatively

You can declare a stage variable which you can populate with a flag of 'Y' or 'N' based on Num test of all your numeric fields and then use these flag values as constraints for the output link and the reject link.
Thanks,
Naveen
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

My problem is that there can be dirty data in any numeric coulmn for any of the tables that we are loading from our Legacy system, no way to know ahead of time. I'm trying to set a trap that will reject any coulmn.
<paranoia>If there's a chance it will be dirty you have to check.</paranoia>

One of the fundamental requirements of a data warehouse is that it contains good data. One of the fundamental tenets of good ETL practice, therefore, is to deliver data that are guaranteed to be good, whatever that means in the particular business context.

There is no convenient mechanism within DataStage to construct a global test. One way would be to construct a routine to which you pass a list of all the numeric columns' values, and which tests that each is numeric returning a true/false result. Or separate stage variables. But, whatever you do, you will at some point need to process each column value, even if just to build the list.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ccatania
Premium Member
Premium Member
Posts: 68
Joined: Thu Sep 08, 2005 5:42 am
Location: Raleigh
Contact:

Post by ccatania »

Thanks Ray,

I think I already knew this, but I was being optimistic at the least that there may be a global solution to this problem. You are 100% correct the DW has to be clean unfortunately they did not take that in consideration when they built the Legacy system I am extracting this data from. Building a pre-Transform process, which I'll dub the 'Cleaning Cycle', will definitely be required. I want to thank you again for you assistance as well as all others members who have replied. :)
Charlie
Post Reply