How to read this file?

ds_user78 · Post by **ds_user78** » Mon Feb 28, 2005 9:21 am

Hi,
I have a data file which has the following layout. How do I read this in DS?

10|ASBT115072 25C4|05|50001670|01|05192003|1|4|EA
40|ASBT115072 25C4|05|50001670|01|SINGLE PHASE PAD TYPE 2 (line 1)
20|ASBT115072 25C4|05|50001670|01|0002||02005|CP05|5RTST|Ratio Test|1.000|EA|X|1|1|1.500|H|5.650|H|3.300|H|0.00
21|ASBT115072 25C4|05|50001670|01|0002||Ratio Test
40|ASBT115072 25C4|05|50001670|01|Header Long Text line 2

Based on the first two characters the structure of the record can vary. Also there is no fixed number of records for each record type except that 10 is the first record type always and it occurs only once for a set of records containing the record types 10,20,21,40. I tried using CFF stage. I am not sure how I should go about it.

ArndW · Post by **ArndW** » Mon Feb 28, 2005 9:27 am

How about declaring the record as containing just one column, then in a transform get the record type by using the function FIELD(InRecord,"|",1). Then you can split and parse the rest of the columns using the FIELD function as required. This will work in your case, sometimes these types of data rows contains over 50 columns and then it becomes a bit tedious to manually program all the splits.

roy · Post by **roy** » Mon Feb 28, 2005 9:46 am

Hi,
To add on what was said you could also use the 1 column definition and transformer to split the file to several files and read each one with it's real table definition if you like to and can afford another write to disk of the source file.

by the way for the creative mind, one could combine a filter command with some OS grep or sed for reading the file several times in parallel using different table definition each reading the rows relevant to that table definition and has it's own flow.

usually a CFF is the natural choice but if for some reason you can't use it in your case you have several alternatives

IHTH,

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Mon Feb 28, 2005 10:01 am

If they imply different meaning and must be processed differently, then use a 'grep' as filter command and extract only the relevant rows.

Alternatively, use some functionality like 'awk' as filter to set the number of columns to be same and define all as VARCHAR(9999) delimited by '|' symbol.

You can define the processing logic in your job.

clshore · Post by **clshore** » Mon Feb 28, 2005 3:32 pm

So, is this a translated COBOL file or what? Is the FD available?
Perhaps you can deal with it as Complex Flat File.

Carter

vigneshra · Post by **vigneshra** » Mon Feb 28, 2005 9:34 pm

Hi
Instead of juggling with DS logic, better you use grep command and seperate the records based on the value in the first column and put the records in seperate file. Later develop jobs for these individual files. That would be a better option to do if the number of values the first column can take is less. Or else go with the options that other gurus are suggesting.

sumitgulati · Post by **sumitgulati** » Wed Mar 02, 2005 5:26 pm

Is the number of record types fixed. In other words there are maximum four record types 10,20,21,40.

-Sumit