Page 1 of 1

How to reject records having Extra Fields in Seq file stage

Posted: Wed Mar 22, 2017 6:41 am
by Apy
Hi,

I am reading a Sample.txt file using Sequential File Stage. Below are the details for .txt file and metadata for Sequential File Stage-
  • 1) 3 Fields are defined in Sequential file stage Schema.
    2) Sample.txt is having below set of records-

Code: Select all

Field1|Field2|Field3
1|2|3
11|22
111|222|333|444
Output of the sequential file stage(Delimiter='|') is as mentioned below-

For Record-1 :
Field1 = 1
Field2 = 2
Field3 = 3

Record-2 is rejected as it is having less number of fields than what is defined in Sequential File Stage Schema.

For Record-3 :
Field1 = 111
Field2 = 222
Field3 = 333|444

As per the functionality, it should reject Record-3 as well.
Any suggestion/opinion on this?

Posted: Wed Mar 22, 2017 7:44 am
by qt_ky
I see the same behavior. I suppose you would need to detect and reject that last scenario (Field3 = 333|444) in a Transformer stage constraint following the Sequential File stage.

Posted: Thu Mar 23, 2017 4:04 am
by ray.wurlod
Read the line as a single VarChar field and effect the parsing in a Transformer stage. If the field does not exist, Field() returns "". The Field() function has a fourth argument, so Col3 would be derived as, for example,

Code: Select all

Field(InLine.TheString, "|", 3, 99)

Posted: Thu Mar 23, 2017 5:29 am
by Apy
Thanks for the response. I was just asking if can handle this at sequential file stage itself. As this is the expected functionality :)

Posted: Sun Mar 26, 2017 3:44 pm
by ray.wurlod
You can define additional fields and set the Drop on Import property, but I don't believe there's any generic solution like the one you seek.

Posted: Tue Apr 04, 2017 3:06 pm
by kris
If you configure Sequential File Stage with below properties and everything else to the defaults or however you need them to your file needs, any record with more (or less) than specified number of columns will be rejected. And you can actually capture these rejected records in a reject file with error message at the end of every error record.

Add a reject link to the input Sequential file stage and configure a file to capture rejects. Then configure the original input file (not reject file) as below:
In Output TAB/Format TAB
1. Record Level/Final Delimiter = end
2. Field Defaults/Delimiter = |
In Output TAB/Properties TAB
1. Options/Reject Mode = Output
2. Reject Reason Column = <column name for your error, perhaps as Error_Message>
This should allow you to capture all rejects with error messages.

Posted: Tue Apr 04, 2017 4:10 pm
by ray.wurlod
The Rejects link only captures rows that do not match the metadata for the row. Since you want to capture the case where there are more than the defined number of fields, rather than reject them, the Reject link is probably not your desired solution.