Sequential File Stage -- Flat file data load

josh.guffey · Post by **josh.guffey** » Sun Dec 04, 2011 5:06 pm

I must be overlooking something really obvious, but I ran into an issue with a flat file load job and wanted to see if my understanding of the sequential file stage is skewed.

One of the systems that we receive data from sends us files that are fixed width. This particular fix width file is defined as 21 bytes wide and I have a load job set up that contains a column called "data" and is defined as a varchar 21. I read the entire contents of each row into this column and in my transformer I have three output columns defined and parse out data based on the file specifications. All of this works fine, but I was building another test and during examination of the file contents in Textpad I noticed an issue with the data.

The file that was transmitted accidentally had two records appended on the previous line during the client system export. When the sequential file stage reads this data it only takes the first 21 characters (as I have defined it) and simply ignores the remaining characters for those two records. I know that I can go in and "fix" this problem by placing a record delimiter at the end of these erroneous lines, but we receive several hundred of these files and that is too time consuming of a task.

Here is a sample of this data, with the first record being correct and the second record having the problem I describe above.

Code: Select all

015427960  AWAAQFB   
015430156  ASAFABH   015431235  AYARD183

Originally the reject mode of the sequential file stage was set to continue, so I changed it to fail or abort (I can't remember the exact option and do not have DataStage in front of me right now) and re ran the job, but it just loaded the 3500 records and did not abort for the two "bad" records. I also put a reject link on the sequential file stage and had those contents go into a peek stage, but the sequential file stage does not reject any records.

I would like for this job to abort due to the extra record that is accidentally appended, but I could not get get that to happen. What would be the best way to get this job to abort (or write these records to the job log or reject table) if I have a data file that is sent in like this in the future? I'm worried that many other files are going to be like this and that we are missing data because this was the first one that I randomly picked and we process a few hundred of these each week. I'm assuming the sequential file stage is behaving as designed, but I couldn't figure out how to get it to tell me that there is "bad" data. Any suggestions/comments are much appreciated.

Thanks for reading this post.

Josh

ray.wurlod · Post by **ray.wurlod** » Sun Dec 04, 2011 9:33 pm

Use Char(21) for the data type rather than VarChar(21). This should cause the reject behaviour you require. You could pass the reject link into a Column Import stage that re-parses the line as two Char(21) columns.

josh.guffey · Post by **josh.guffey** » Mon Dec 05, 2011 7:13 pm

I modified the job and changed the sequential file stage to Char(21) as suggested. This works great and rejects the record perfectly. Is it a best practice to use Char instead of Varchar in these scenarios?

I also incorporated the Column Import stage off of the sequential file reject link and I am able to capture the two records in separate columns and write them out to a reject file (or table if I choose to do so). Thanks for the tips/feedback.

Josh

ray.wurlod · Post by **ray.wurlod** » Mon Dec 05, 2011 10:09 pm

I always prefer Char to VarChar in fixed width files. For one thing it's more efficient.

josh.guffey · Post by **josh.guffey** » Tue Dec 06, 2011 8:16 am

Thanks Ray -- I appreciate the feedback.

Josh