Flattening a fixed width file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There are several approaches you can take. What you need to determine r answer is how you differentiate between your 3 record types. Is the first field of each record type a single digit, "1", "2" or "3"?
If you have such a field then this problem is easily solved using the CFF stage, since this is a typical COBOL type of data layout.
Jayakannan
Participant
Posts: 73
Joined: Wed Sep 30, 2009 5:20 am

Post by Jayakannan »

Hi,

The first field in each record type is RECORD_LENGTH which is of 4 digits and the record type identifier is at the position of 155 in each record type.

Whatever i have mentioned above is just for an example. My actual file has 50+ record types.
Regards,
Kannan
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

In that case the CFF Stage is your answer, it will automatically correctly read each record type. The documentation does a barely acceptable job describing how to do this but it should get you started.
Jayakannan
Participant
Posts: 73
Joined: Wed Sep 30, 2009 5:20 am

Post by Jayakannan »

I am trying with a sample job and below is my schema.

record
{record_delim='\n', delim=none, quote=none, binary, ascii, native_endian, charset='ISO-8859-1'}
(
complex_flat_file_1_record_id:string[2] {position=0,link,};
complex_flat_file_1_record_type:tagged {reference=complex_flat_file_1_record_id,position=0} (
RECORD_1:subrec {tagcase=01} (
RecordID1:string[2];
SSN:string[9];
Name:string[30];
);
NEWRECORD:subrec {tagcase=02} (
RecordID2:string[2];
AccountNumber:string[30];
Balance:string[20];
);
NEWRECORD2:subrec {tagcase=03} (
RecordID3:string[2];
Occupation:string[30];
Income:string[20];
);
);
)


While running the job i am getting below warning messges.

Complex_Flat_File_1,0: Tag value: 01 does not match any tagcase for tagged field "complex_flat_file_1_record_type"
Complex_Flat_File_1,0: Tag value: 02 does not match any tagcase for tagged field "complex_flat_file_1_record_type"

I have searched the forum and found some posts regarding this but not able to understand the solution. Any help is highly appreciated.
Regards,
Kannan
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Have you tried adding quotes to the column values in the DFF stage and does that make a difference?
Jayakannan
Participant
Posts: 73
Joined: Wed Sep 30, 2009 5:20 am

Post by Jayakannan »

Hi,

Thank you, after adding quote it worked fine when each record type is in separate rows.

Example:
<Row1>01123456789Joe Test 1
<Row2>02Account1 100.00 2
<Row3>03News Anchor 52000.00 3

When there are more than one record type in a row then it is reading only first record type and not the others.

Example:
<Row1>01123456789Joe Test 102Account1
<Row2> 100.00 203News Anchor
<Row3> 52000.00 3

Is it possible to read the above type of record?
Regards,
Kannan
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You need to change your file to either have only one record type per row OR to remove the line breaks (with sed or similar) and change the CFF definition as well.
Jayakannan
Participant
Posts: 73
Joined: Wed Sep 30, 2009 5:20 am

Post by Jayakannan »

Thanks ArndW..

I have one more question.

There are some fields defined as DECIMAL in CFF stage and when I try to view the data or run the job I am getting below warning message.

Field "RECORD_LENGTH" has import error and no default value; data: {0 8 4}, at offset: 0
where RECORD_LENGTH is the first field and defined as DECIMAL(4).

If I change the field definition from DECIMAL to DISPLAY NUMERIC then I can view the data and run the job successfully.

Any suggestion?
Regards,
Kannan
Jayakannan
Participant
Posts: 73
Joined: Wed Sep 30, 2009 5:20 am

Post by Jayakannan »

Is there a way to remove the below warning message?

cff_1,0: Tag value: 27 does not match any tagcase for tagged field "cff_1_record_type"

Reason for warning: The record type 27 exists in the input data but we don't need in output, so it is not mapped to output.
Regards,
Kannan
nikhil_bhasin
Participant
Posts: 50
Joined: Tue Jan 19, 2010 4:14 am

Post by nikhil_bhasin »

You can define the record type and record id constraint for '27' but while defining the output constraints, dont map the record id =27. In that way you can get rid of this warning.
Jayakannan
Participant
Posts: 73
Joined: Wed Sep 30, 2009 5:20 am

Post by Jayakannan »

There are around 120 record types in the file and I need to extract only 11 out of 120, so it deosn't seem to be flexible to define all the 120 record types. Is there any other way around?
Regards,
Kannan
Jayakannan
Participant
Posts: 73
Joined: Wed Sep 30, 2009 5:20 am

Post by Jayakannan »

Job design is CFF stage -> 11 Dataset stages

The records are getting extracted and written into datasets but the link count is showing as 0 for all the 11 datasets but I am able to view the data in the datasets.

Any idea why the link count is showing as 0?
Regards,
Kannan
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

There is probably a copy stage in the stream that has been removed at runtime by the parallel engine. When that happens the link you see in the job design no longer exists and therefore no record count can be provided.

FYI: The CFF stage, when your job is compiled, is converted into multiple operators based upon the options and record layouts within CFF. You can see this by looking at the Generated OSH after you compile your job.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply