Rejecting records with invalid data type
Moderators: chulett, rschirm, roy
Rejecting records with invalid data type
Hi,
I am designing a parallel job. In parallel jobs typically if the incoming column data type is invalid (for example string data for a numeric type column) the job warns "Numeric string expected of the appropriate decimal precision for input column 'xxx'. Use default value"
and substitues '0' for this column and writes to the output.
Instead I want the record to be rejected and not pass thru the output link.
Is there a way to achieve the above.
Your help is greatly appreciated.
Regards
Sri
I am designing a parallel job. In parallel jobs typically if the incoming column data type is invalid (for example string data for a numeric type column) the job warns "Numeric string expected of the appropriate decimal precision for input column 'xxx'. Use default value"
and substitues '0' for this column and writes to the output.
Instead I want the record to be rejected and not pass thru the output link.
Is there a way to achieve the above.
Your help is greatly appreciated.
Regards
Sri
-
- Premium Member
- Posts: 1255
- Joined: Wed Feb 02, 2005 11:54 am
- Location: United States of America
-
- Premium Member
- Posts: 1255
- Joined: Wed Feb 02, 2005 11:54 am
- Location: United States of America
I guess so. That's all I can think of as of now. If you have 100 columns, and there exists a dependency of 2-99 columns on column 1 (99 columns depend on one column), then you can have only column 1 in the constraint. What I mean is if column 1 has wrong data coming, so will columns 2-99, then having just column 1 in the constraint will help. Otherwise, you should check all the columns.sri1dhar wrote:Thanks for the response. You mean check the data type of each column? What if I have 100 columns in the input link.
Wait and see if anybody else responds.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
The Sequential File input stage does support the user of a reject link with a couple options of when to reject a row. You can set a different tolerance level so that on some implicit conversion failures the row gets rejected rather than fudged.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Premium Member
- Posts: 1255
- Joined: Wed Feb 02, 2005 11:54 am
- Location: United States of America
Re: Rejecting records with invalid data type
The OP never mentioned that the job has sequential files or datasets in it. Or am I missing something?sri1dhar wrote:Hi,
I am designing a parallel job. In parallel jobs typically if the incoming column data type is invalid (for example string data for a numeric type column) the job warns "Numeric string expected of the appropriate decimal precision for input column 'xxx'. Use default value"
and substitues '0' for this column and writes to the output.
Instead I want the record to be rejected and not pass thru the output link.
Is there a way to achieve the above.
Your help is greatly appreciated.
Regards
Sri
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
MetaData mismatches are often the result of "Human" entered metadata, which is, usually, the case with flat files and datasets. For databases, we import. And hence my hunch that its either a dataset or a flat file.
No mention about it by the OP.
No mention about it by the OP.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
I assumed text file sources but it doesn't make a big differenct. For a text file source you can implicitely convert the data in a sequential file input stage with a reject link, or you can import it as string fields and implicitely convert it in a transformer stage with a reject link. For database source data you can also implicitely convert the data in a transformer. (Where implicit conversion is where you map a field to a data type without any conversion code).
The more code intensive method is to explicitely convert the data, this is where you use conversion functions such as string_to_date and string_to_number, you can also explicitely detect mismatches with isdate() and isnumber() functions wrapped around the convert function. I assume the requester wants to avoid this as they have 100+ columns.
I suggest using explicit checks and conversions where the front end GUI allows free text entry into what should be a specific data type field, such as free text entry of dates and postcodes and numbers. This should only be a small proportion of the fields.
The more code intensive method is to explicitely convert the data, this is where you use conversion functions such as string_to_date and string_to_number, you can also explicitely detect mismatches with isdate() and isnumber() functions wrapped around the convert function. I assume the requester wants to avoid this as they have 100+ columns.
I suggest using explicit checks and conversions where the front end GUI allows free text entry into what should be a specific data type field, such as free text entry of dates and postcodes and numbers. This should only be a small proportion of the fields.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
I am the OP. Sorry for the delayed reply as I was away.
My input is a sequential file. As "Vincent" mentioned I am importing the data as string type and implicitly converting in the Transformer. I had to do this for other reasons.
Vincent you mentioned "you can import it as string fields and implicitely convert it in a transformer stage with a reject link."
Although I attach a reject link to the transformer the data still passes thru the stream output link and replaces invalid data with default value instead of outputting to the reject link.
My input is a sequential file. As "Vincent" mentioned I am importing the data as string type and implicitly converting in the Transformer. I had to do this for other reasons.
Vincent you mentioned "you can import it as string fields and implicitely convert it in a transformer stage with a reject link."
Although I attach a reject link to the transformer the data still passes thru the stream output link and replaces invalid data with default value instead of outputting to the reject link.