Complex Flat File stage with Variable Length rows

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
D0n1117
Premium Member
Premium Member
Posts: 11
Joined: Sun Dec 19, 2010 1:49 pm
Location: VA

Complex Flat File stage with Variable Length rows

Post by D0n1117 »

I have an incoming file which I would like to read utilizing complex flat file stage. I do not have the cobol file definitions.

Data looks like the following

AAA123sdg20101120(0D0A) -->header
BBB2345ABCDE555(0D0A)--> Batch header
CCC123 asdasdfadf 20101019 v567(0D 0A record delimeter) Detailed records
CCC124 aadsadsa 20100101 v5987 201000122 1234 v900(0D 0A)
YYYY1231abdc(0D0A)---> Batch End
ZZZZ23424201010-->Trailer

My table definitions consist of 20 columns.

My issue is , if table definition has more columns defined after the record delimeter, CFF stage rejects the detailed records(Input buffer overrun). Is there way to make this process work with CFF stage. Again record lengths are fixed but the record delimeter is variable .

Thanks
Don
DataStage Developer
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Again record lengths are fixed but the record delimeter is variable .
Where did you mention fixed record lengths in the first place, and what exactly do you mean by this (i.e. "record delimiter is variable")?

So, are your records, as defined by the record delimiter, fixed-length or variable-length? Your example data indicates variable-length with CR/LF (windows-style) record delimiters.

You need to define the metadata for each record type appropriately...some probably have less that 20 columns, some have more than 20 columns. Doing this without a layout of some sort (COBOL FDs or some other description) is going to be difficult, even with example data contains full examples of each record type. You could possibly start off with a record-type column and a varchar column for each identified record type and go from there. I would seriously push to get layout info of SOME sort from the providers/clients.

Regards,

[/b]
- james wiles


All generalizations are false, including this one - Mark Twain.
D0n1117
Premium Member
Premium Member
Posts: 11
Joined: Sun Dec 19, 2010 1:49 pm
Location: VA

Post by D0n1117 »

Fixed length is defined in file options. I have tried with variable option as well but still failing with input buffer overflow.

Meta data is defined appropriately. Record delimiter I am using in this case is "\n" or unix newline both are producing the same result. Not sure how to define CR/LF in record delimeter drop down in record options tab.

What I meant with record delimeter variable is that , each row can end at any point. Some rows can end with 15 columns some can with 18 columns some with 12 columns. Maximum column count is 20.

If I use varchars then I am getting column too short errors. Data dictionary is correctly defined.

what works is if I only define 15 columns, those rows with 15 columns are coming through properly. Anything above or below 15 columns are getting buffer overflow error. Thats where my hung up is. Any help would be appreciated.
Don
DataStage Developer
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Ok. It is apparent that you are having difficulty with understanding the basics of how to appropriately read a datafile, or perhaps we're just talking slightly different languages :)

1) Your record delimiter: CR/LF (0D0A) vs LF (0A). CR/LF is DOS Format, LF is Unix Format. Your records are delimited with DOS Format delimiters (0D0A)--2 bytes long. You need to use Record Delimiter String instead of Record Delimiter. See the Parallel Job Developer Guide for more information.

2) Your record length: Your records are NOT fixed length...this is obvious even from your own description of the record delimiter--"What I meant with record delimeter variable is that , each row can end at any point." That is the definition of variable length. To use the Fixed Length definition in the metadata is shooting yourself in the foot...it won't work.

3) You have multiple record types with different metadata for each type. CFF will support that, but you have to define each record type correctly. Do you know what column indicates the record type? Is it consistently in the same location from record to record? It looks pretty obvious from your example data....AAA,BBB,CCC,YYY,ZZZ....this is the most likely candidate. Which record types have only 15 columns? Which types have 20? Do some analysis on the data and/or look at your layout information (you obviously had something from which to create 15-20 columns of metadata). Then create appropriate record definitions within CFF for each type. As you have already discovered, 20 columns alone will not work for every record.

4) Read Chapter 10 of the Parallel Job Developer Guide, "Complex Flat File Stage".

5) Have patience and take your time understanding your data.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
D0n1117
Premium Member
Premium Member
Posts: 11
Joined: Sun Dec 19, 2010 1:49 pm
Location: VA

Post by D0n1117 »

perhaps slight misunderstanding :),

does the Complex flat file stage support record_delim_string ? if so where and how do I define it ? Version used 8.5

Thanks
Don
DataStage Developer
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

D0n1117 wrote:does the Complex flat file stage support record_delim_string ?
No.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Don,

Out of curiosity, what was the workaround? I was going to suggest stripping the CR (0D) out of the file, either before processing or with a filter.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply