COBOL flat file import

leathermana · Post by **leathermana** » Mon Feb 24, 2014 5:03 pm

I have what I think is a simple question but have not found a simple answer yet. Please clue me in. I need to import a COBOL flat file with a couple hundred fields, 5 of which are decimal COMP-3, and two of which are COMP. I think I see that I may be able to import the COMP-3 fields into a sequential file but I don't see any way to handle the COMP fields. Is it possible or do I need to use the CFF stage, or ..... ? It will help me dive into this if I can quit dithering around about how to get started. I've done lots of work with sequential files but this is the first time I've dealt with COBOL flat files and I've never used the CFF stage.

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Mon Feb 24, 2014 8:01 pm

Start by getting the COBOL copybook for the file. See if that can be imported successfully into DataStage for the CFF stage. If you can import the copybook successfully it will allow you to use the CFF stage to read the EBCIDIC flat file and then move it into DataStage for processing.

FranklinE · Post by **FranklinE** » Tue Feb 25, 2014 11:00 am

You may find some help in this FAQ: Using Mainframe Source Data

leathermana · Post by **leathermana** » Tue Feb 25, 2014 11:51 am

Trying to access the FAQ, I am getting: "Sorry, but only users granted special access can read topics in this forum."
How do I get "special access"?

FranklinE · Post by **FranklinE** » Tue Feb 25, 2014 12:13 pm

Well, that's just bizarre, unless there's something wrong with your registration. Craig?

chulett · Post by **chulett** » Tue Feb 25, 2014 2:12 pm

Unfortunately, it has been witnessed and discussed before.

Not sure how / if these ever get resolved but it would need to come from the people behind the site.

FranklinE · Post by **FranklinE** » Tue Feb 25, 2014 2:39 pm

Oh, well. In the meantime:

First step is to have a text file with the Cobol copybook in it. There are specific syntax/edit requirements for this, and if you don't have a Cobol developer to help (assuming there isn't a copybook already out there for you to use), here are the basics.

01 record-level name.

This is required for all COBOL FD imports.

05 field-name pic [].

This is the line for each field definition. "Pic" is the picture clause, and this might cover your needs (examples).

Code: Select all

PIC X(15).  Alphanumeric field of 15 bytes. Storage 15 bytes.

PIC S9(6).9(2).  Numeric (subset of alphanumeric) field of 6 digits, decimal, 2 digits. "S" means explicitly signed. Unsigned omits the "S". Storage is 9 bytes, the decimal being included, but the sign modifying the last (or first) digit's hexadecimal value.

PIC S9(6)V9(2). The decimal is implied. The number is stored as an integer and the decimal position provided by the read process. Storage 8 bytes.

PIC S9(10) COMP. 10-byte binary-stored numeric. COMP can be thought of as "compression" meaning that the physical storage is defined by the operating system.

PIC S9(8)V9(2) COMP-3. This is specific to packed decimal. Storage is divide by 2 and round up or add 1. This one is 6 bytes in storage because packed-decimal storages the native digit in each half-byte, and the last half-byte is always reserved for the sign even for unsigned numbers.

Sometimes a lazy programmer will type out the character definition instead of using a number in parentheses, like PIC S9999V99 instead of PIC S9(4)V9(2).

Create a 05 for every field in the exact order it appears. Complex things like group-elementary items, OCCURS clauses, etc. require your understanding of them or an in-house resource to assist you.

All of them are handled automatically by DataStage during import. That's almost 100% true, so you need to inspect every imported table definition -- Layout tab, radial button for Cobol -- to be sure.

Good luck.

leathermana · Post by **leathermana** » Tue Feb 25, 2014 3:40 pm

FranklinE, Thanks VERY much for your responses. It looks like I may be creating the copybook file myself. Am I correct in my understanding that what is required here is just a text file with the data definitions as you've described in it? I have such a file without the preceding 05's or the 01 record-level name entry. I don't know what the record-level name is. Fortunately I don't have any OCCURS clauses, but I do have the COMP and COMP-3. As long as I define correctly I'll count on DataStage to know what to do with them. Radial Button? ...?...
Again... Thanks.

FranklinE · Post by **FranklinE** » Tue Feb 25, 2014 3:59 pm

You're welcome.

Edit: forgot one thing. DS table def import looks for default text file extension of ".cfd" for COBOL FD imports. You can create your copybook with your favorite plain-text editor. Do not use a word processor. It can get ugly.

Things to know:

Cobol code formatting is based on a six-position prefix, the 7th is reserved for putting '*' to mark a comment line, and all code starts at position 8. Using characters 'x' and '_' instead of spaces for your example:

Code: Select all

xxxxxx_01 GIVE-RECORD-NAME-OF-CHOICE.
xxxxxx_xxxx05 FIRST-FIELD-NAME PIC X(25).
xxxxxx_xxxx05 SECOND-FIELD-NAME PIC 9(8)V9(2) COMP.
xxxxxx_xxxx05 THIRD-FIELD-NAME PIC S9(6)V9(2) COMP-3.
xxxxxx* IMPORT IGNORES COMMENT LINES.
etc.

A space is a delimiter in Cobol syntax. Always use hypens in field names. COBOL FD import will convert them to underscores.

That's a crude illustration. You indent level numbers after 01 for ease of reading. 01 is the required group item for the entire record. Level numbers can be anything you want to two-digits up to 60 (there's reserved level numbers above that). 05 is a usual standard for individual field items. You can name them any way you wish, but I suggest at least a mnemonic link to how the field is used.

After import, open the table definition (double-click or right-click menu "Properties"). The "Layout" tab displays the table in three formats, which you choose by clicking the radial button next to the format name. Cobol format displays your copybook as the import "translates" it. You can display any table definition in Cobol format regardless of how you created it.

The import utility is pretty good in telling you why it fails (and lets you edit the copybook file right there). Successful import doesn't mean succesful reads, but warning and error messages will guide you to the tweaks you need.

This is a good learn-by-doing area. Don't hesitate to try it.

leathermana · Post by **leathermana** » Thu Feb 27, 2014 11:51 am

Franklin,
Your help has been very valuable. Once I got the .cfd file created as you specified, it imported easily on the first try. I've been reading logs and trying all kinds of combinations since to get the data to load. Clicking the "Print Fields" check box on the CFF "Record Options" tab on the "Stage" tab created some very helpfull log files. I also don't know what I would have done without EditPadPro where I can look at the text and the HEX at the same time, mostly for counting bytes and finding CR/LF record delimiters etc. At this point I am dealing with "short" records looking for a way to pad them to the proper length or some such thing to keep them from being rejected.

Still would like to know if anyone can tell me if there is a way to load a COBOL flat file with CHAR, COMP and COMP-3 data using a Sequential File stage. My DataStage resource folks here keep telling me NOT to use CFF and to use Sequential File. Haven't had much luck in those attempts though. I dunno.......

By the way, I emailed the editor@dsexchange.com and he got back to me fairly quickly, promising to fix the access issue to the link: viewtopic.php?t=143596
within 12 hours. Still not working, but I'm giving him the benefit of the doubt since I don't know what time zone he's in.

Gratefully, Alden

FranklinE · Post by **FranklinE** » Thu Feb 27, 2014 12:02 pm

Alden,

You're doing all the right things. The following is honestly written with a tone of incredulity.

CFF is designed specifically for Cobol/EBCDIC data sources, and any DataStage developer who warns you away from it is not just wrong...

Ahem. Sorry.

I don't have the system support to use CFF for direct access, and I miss it. I have to manually parse files with multiple record types (header, detail, trailer) in transformers. I use FTP Enterprise for direct access to my Cobol-source files, and I did prove a workaround I could use if I didn't consider doubling my I/O a bad choice:

FTP the file in native EBCDIC to a sequential file, then use whatever they force you to use from there. I can use CFF from a sequential file, I just can't justify it otherwise.

chulett · Post by **chulett** » Thu Feb 27, 2014 12:12 pm

I believe that in the Sequential file, you can set the individual field properties for those COMP fields to "Packed" and have it unpack them for you. Don't recall if it will handle any EBCDIC translation for string fields as well, however. As noted, the whole point of the CFF stage is to handle files of this nature.

FranklinE · Post by **FranklinE** » Thu Feb 27, 2014 12:19 pm

Craig, I have practical proof that your belief is correct. Also, character set handling is as transparent from Sequential File as it is from FTP Enterprise if the file in question was correctly written in native EBCDIC and has no control characters imposed by the operating system.

ray.wurlod · Post by **ray.wurlod** » Thu Feb 27, 2014 2:23 pm

leathermana wrote:My DataStage resource folks here keep telling me NOT to use CFF and to use Sequential File.

Resist stupid requirements.

Would you accede to a request to tighten a nut using a hammer? It's simply a case of using the right tool for the job.

leathermana · Post by **leathermana** » Mon Mar 03, 2014 10:37 am

I am getting records to load thanks to the help I've gotten here. The remaining problem is these "short" records. These are fixed length, 1000 byte records but some of them are coming in at 979 bytes and I am getting a "short input record" error in the log. When I look at these records in the hex editor in Edit Pad Pro, I can count the bytes and see that they are indeed short. However, when we look at these records on the server using IBM Rational Host On Demand, we see the hex 40 (for EBCDIC spaces) in these records. I am using different FTP clients to move the file to our system and no matter what client I use or whether I specify ASCII or BINARY, I am getting these missing spaces. Any ideas? Could the Host On Demand be padding the display of these records since they are defined as 1000 byte fixed length when in reality the spaces are missing in the file?