Rejecting records with invalid data type

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sri1dhar
Charter Member
Charter Member
Posts: 54
Joined: Mon Nov 03, 2003 3:57 pm

Rejecting records with invalid data type

Post by sri1dhar »

Hi,

I am designing a parallel job. In parallel jobs typically if the incoming column data type is invalid (for example string data for a numeric type column) the job warns "Numeric string expected of the appropriate decimal precision for input column 'xxx'. Use default value"
and substitues '0' for this column and writes to the output.

Instead I want the record to be rejected and not pass thru the output link.

Is there a way to achieve the above.

Your help is greatly appreciated.

Regards
Sri
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

You can use Num, Alpha, AlNum,etc functions in your constraint.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
sri1dhar
Charter Member
Charter Member
Posts: 54
Joined: Mon Nov 03, 2003 3:57 pm

Post by sri1dhar »

[quote="I_Server_Whale"]You can use [i]Num, Alpha, AlNum,[/i]etc functions in your constraint.[/quote]

Thanks for the response. You mean check the data type of each column? What if I have 100 columns in the input link.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

sri1dhar wrote:Thanks for the response. You mean check the data type of each column? What if I have 100 columns in the input link.
I guess so. That's all I can think of as of now. If you have 100 columns, and there exists a dependency of 2-99 columns on column 1 (99 columns depend on one column), then you can have only column 1 in the constraint. What I mean is if column 1 has wrong data coming, so will columns 2-99, then having just column 1 in the constraint will help. Otherwise, you should check all the columns.

Wait and see if anybody else responds.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Dont datasets support rejected links. So do sequential files. You dont need to check for it inside the transformer. The reject links will take care of it. As far as I know.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

The Sequential File input stage does support the user of a reject link with a couple options of when to reject a row. You can set a different tolerance level so that on some implicit conversion failures the row gets rejected rather than fudged.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Re: Rejecting records with invalid data type

Post by I_Server_Whale »

sri1dhar wrote:Hi,

I am designing a parallel job. In parallel jobs typically if the incoming column data type is invalid (for example string data for a numeric type column) the job warns "Numeric string expected of the appropriate decimal precision for input column 'xxx'. Use default value"
and substitues '0' for this column and writes to the output.

Instead I want the record to be rejected and not pass thru the output link.

Is there a way to achieve the above.

Your help is greatly appreciated.

Regards
Sri
The OP never mentioned that the job has sequential files or datasets in it. Or am I missing something? :?
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

MetaData mismatches are often the result of "Human" entered metadata, which is, usually, the case with flat files and datasets. For databases, we import. And hence my hunch that its either a dataset or a flat file.
No mention about it by the OP.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I assumed text file sources but it doesn't make a big differenct. For a text file source you can implicitely convert the data in a sequential file input stage with a reject link, or you can import it as string fields and implicitely convert it in a transformer stage with a reject link. For database source data you can also implicitely convert the data in a transformer. (Where implicit conversion is where you map a field to a data type without any conversion code).

The more code intensive method is to explicitely convert the data, this is where you use conversion functions such as string_to_date and string_to_number, you can also explicitely detect mismatches with isdate() and isnumber() functions wrapped around the convert function. I assume the requester wants to avoid this as they have 100+ columns.

I suggest using explicit checks and conversions where the front end GUI allows free text entry into what should be a specific data type field, such as free text entry of dates and postcodes and numbers. This should only be a small proportion of the fields.
sri1dhar
Charter Member
Charter Member
Posts: 54
Joined: Mon Nov 03, 2003 3:57 pm

Post by sri1dhar »

I am the OP. Sorry for the delayed reply as I was away.

My input is a sequential file. As "Vincent" mentioned I am importing the data as string type and implicitly converting in the Transformer. I had to do this for other reasons.

Vincent you mentioned "you can import it as string fields and implicitely convert it in a transformer stage with a reject link."
Although I attach a reject link to the transformer the data still passes thru the stream output link and replaces invalid data with default value instead of outputting to the reject link.
Post Reply