One complex mainframe flat file -- many columns -- read/spli

nkreddy · Post by **nkreddy** » Wed Mar 08, 2006 2:47 pm

Hi,

I would appreciate some ideas in this...

Information I have

1) One mainframe flat file in EBCDIC format with account and transaction information. Binary FTPed to the ETL Server from Mainframe.
2) Five cobol copy books (including the Header/Trailer copy book)
3) There are too many columns in each copy book (approx 800)

Challenges:

I would like to design a PX job where I would have to read this one huge EBCDIC file (with approx 2500 columns), convert it into ASCII and split the file based on record type.

The copy book that I have has 02 item level and not 01 level. My understanding is that DataStage CANNOT import CFDs with 02 level. When I changed it to 01 in the Copy book, I was able to import all the copy books..

Later I dropped all the individual CFDs on to the same output link from CFF stage. Since there is one file, I am not sure reading the 2500 is a good option.

Please advice if there is any other option..

Thank You

seanc217 · Post by **seanc217** » Thu Mar 09, 2006 12:28 pm

We had to deal with multi-format files. In Hawk the complex file stage will support this construct. In 7.5 it does not, so we wrote a generic file splitter to split the records into files for each record code. It was originally written in VB, but I converted it to Java for platform independence.

If you are interested in more information on this route, let me know.

nkreddy · Post by **nkreddy** » Thu Mar 09, 2006 8:04 pm

Sure...Does this file splitter splits the EBCDIC file as per the requirements?

It seems like the Cobol Copy book that we have received is incorrect. The SAS folks gave us that information.

I would be interested in knowing about the option you had suggested...

Thank You

seanc217 · Post by **seanc217** » Fri Mar 10, 2006 8:28 am

The file splitter program takes an offset parameter and a length so that it can identify the record code. The code creates 2 byte arrays one to read the header information up to the offset and length parameters passed in, and the other one to read the rest of the record. If you are dealing with fixed length records, there should be a 2 or 3 byte sequence that will tell you how long the record is. You can use that to read the rest of the record minus the header piece you read in. My files are in EBCDIC too, this is a simple lookup table to convert the characters you need to ascii. The code also dynamically creates files based on the record type identifier. For example if my file name is customer.dat and I have a record identifier of 10 I create a file customer.dat_10. Hopefully I gave you enough to get started. If you have any questions feel free to post them.