Sequential File Stage -- Flat file data load
Posted: Sun Dec 04, 2011 5:06 pm
I must be overlooking something really obvious, but I ran into an issue with a flat file load job and wanted to see if my understanding of the sequential file stage is skewed.
One of the systems that we receive data from sends us files that are fixed width. This particular fix width file is defined as 21 bytes wide and I have a load job set up that contains a column called "data" and is defined as a varchar 21. I read the entire contents of each row into this column and in my transformer I have three output columns defined and parse out data based on the file specifications. All of this works fine, but I was building another test and during examination of the file contents in Textpad I noticed an issue with the data.
The file that was transmitted accidentally had two records appended on the previous line during the client system export. When the sequential file stage reads this data it only takes the first 21 characters (as I have defined it) and simply ignores the remaining characters for those two records. I know that I can go in and "fix" this problem by placing a record delimiter at the end of these erroneous lines, but we receive several hundred of these files and that is too time consuming of a task.
Here is a sample of this data, with the first record being correct and the second record having the problem I describe above.
Originally the reject mode of the sequential file stage was set to continue, so I changed it to fail or abort (I can't remember the exact option and do not have DataStage in front of me right now) and re ran the job, but it just loaded the 3500 records and did not abort for the two "bad" records. I also put a reject link on the sequential file stage and had those contents go into a peek stage, but the sequential file stage does not reject any records.
I would like for this job to abort due to the extra record that is accidentally appended, but I could not get get that to happen. What would be the best way to get this job to abort (or write these records to the job log or reject table) if I have a data file that is sent in like this in the future? I'm worried that many other files are going to be like this and that we are missing data because this was the first one that I randomly picked and we process a few hundred of these each week. I'm assuming the sequential file stage is behaving as designed, but I couldn't figure out how to get it to tell me that there is "bad" data. Any suggestions/comments are much appreciated.
Thanks for reading this post.
Josh
One of the systems that we receive data from sends us files that are fixed width. This particular fix width file is defined as 21 bytes wide and I have a load job set up that contains a column called "data" and is defined as a varchar 21. I read the entire contents of each row into this column and in my transformer I have three output columns defined and parse out data based on the file specifications. All of this works fine, but I was building another test and during examination of the file contents in Textpad I noticed an issue with the data.
The file that was transmitted accidentally had two records appended on the previous line during the client system export. When the sequential file stage reads this data it only takes the first 21 characters (as I have defined it) and simply ignores the remaining characters for those two records. I know that I can go in and "fix" this problem by placing a record delimiter at the end of these erroneous lines, but we receive several hundred of these files and that is too time consuming of a task.
Here is a sample of this data, with the first record being correct and the second record having the problem I describe above.
Code: Select all
015427960 AWAAQFB
015430156 ASAFABH 015431235 AYARD183
I would like for this job to abort due to the extra record that is accidentally appended, but I could not get get that to happen. What would be the best way to get this job to abort (or write these records to the job log or reject table) if I have a data file that is sent in like this in the future? I'm worried that many other files are going to be like this and that we are missing data because this was the first one that I randomly picked and we process a few hundred of these each week. I'm assuming the sequential file stage is behaving as designed, but I couldn't figure out how to get it to tell me that there is "bad" data. Any suggestions/comments are much appreciated.
Thanks for reading this post.
Josh