Sequential File Stage -- Flat file data load

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
josh.guffey
Participant
Posts: 40
Joined: Thu Apr 17, 2008 1:52 pm
Location: Huntsville, AL

Sequential File Stage -- Flat file data load

Post by josh.guffey »

I must be overlooking something really obvious, but I ran into an issue with a flat file load job and wanted to see if my understanding of the sequential file stage is skewed.

One of the systems that we receive data from sends us files that are fixed width. This particular fix width file is defined as 21 bytes wide and I have a load job set up that contains a column called "data" and is defined as a varchar 21. I read the entire contents of each row into this column and in my transformer I have three output columns defined and parse out data based on the file specifications. All of this works fine, but I was building another test and during examination of the file contents in Textpad I noticed an issue with the data.

The file that was transmitted accidentally had two records appended on the previous line during the client system export. When the sequential file stage reads this data it only takes the first 21 characters (as I have defined it) and simply ignores the remaining characters for those two records. I know that I can go in and "fix" this problem by placing a record delimiter at the end of these erroneous lines, but we receive several hundred of these files and that is too time consuming of a task.

Here is a sample of this data, with the first record being correct and the second record having the problem I describe above.

Code: Select all

015427960  AWAAQFB   
015430156  ASAFABH   015431235  AYARD183  
Originally the reject mode of the sequential file stage was set to continue, so I changed it to fail or abort (I can't remember the exact option and do not have DataStage in front of me right now) and re ran the job, but it just loaded the 3500 records and did not abort for the two "bad" records. I also put a reject link on the sequential file stage and had those contents go into a peek stage, but the sequential file stage does not reject any records.

I would like for this job to abort due to the extra record that is accidentally appended, but I could not get get that to happen. What would be the best way to get this job to abort (or write these records to the job log or reject table) if I have a data file that is sent in like this in the future? I'm worried that many other files are going to be like this and that we are missing data because this was the first one that I randomly picked and we process a few hundred of these each week. I'm assuming the sequential file stage is behaving as designed, but I couldn't figure out how to get it to tell me that there is "bad" data. Any suggestions/comments are much appreciated.

Thanks for reading this post.

Josh
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use Char(21) for the data type rather than VarChar(21). This should cause the reject behaviour you require. You could pass the reject link into a Column Import stage that re-parses the line as two Char(21) columns.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
josh.guffey
Participant
Posts: 40
Joined: Thu Apr 17, 2008 1:52 pm
Location: Huntsville, AL

Post by josh.guffey »

I modified the job and changed the sequential file stage to Char(21) as suggested. This works great and rejects the record perfectly. Is it a best practice to use Char instead of Varchar in these scenarios?

I also incorporated the Column Import stage off of the sequential file reject link and I am able to capture the two records in separate columns and write them out to a reject file (or table if I choose to do so). Thanks for the tips/feedback.

Josh
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I always prefer Char to VarChar in fixed width files. For one thing it's more efficient.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
josh.guffey
Participant
Posts: 40
Joined: Thu Apr 17, 2008 1:52 pm
Location: Huntsville, AL

Post by josh.guffey »

Thanks Ray -- I appreciate the feedback.

Josh
Post Reply