Complex Flat File stage with Variable Length rows

Gen1715 · Post by **Gen1715** » Sun Feb 10, 2013 12:01 am

I have same scenario as Don did in this post.

My file looks something like this -
CUS_INFO
05 First_Name PIX(5).
05 Last_Name PIX(5).
05 Age PIX(5).
05 Phone_Grp Occurs 3 Times PIX(10).

Sample File -
SunilBassi031012345678901234567890123456789
SunilBassi03101234567890123456789
SunilBassi0310123456789

and record delimiter will be \n - next line character.

Sunil being First Name
Bassi being Last Name
031 being Age
0123456789 being phone number

Now I may have scenario - when one customer may have 3 phone number, one may have only two, one may have one and one may have none. and data may come as mentioned in sample file.

Max number of columns - 6
Min number of columns - 3

and number of columns may be 3,4,5 or 6 and this is no delimiter variable length text file and need to read is through CFF stage.

I have used,
used .cfd metadata
file types(s) in file option-
Record type as variable length
data type as text, ascii
record delimiter as unix new line.

Now I may read records which are having all 6 columns, but if records having less columns.. CFF not able to read those records -

Can anyone provide solution.

chulett · Post by **chulett** » Sun Feb 10, 2013 9:46 am

Please don't jump on the end of older posts. I've split your post out on its own and linked it back.

ray.wurlod · Post by **ray.wurlod** » Sun Feb 10, 2013 1:01 pm

I assume that PIX(5) actually means PIC X(5).

Because there is no OCCURS DEPENDING ON clause Phone_Grp will always occur exactly three times. Presumably the unused fields are blank-filled. Since the data type is string, DataStage will happily read those strings; you can filter them out (replace with null) subsequently.

123gopal · Post by **123gopal** » Sun Feb 10, 2013 1:03 pm

can be use Trim(),Ltrim() and Rtrim() fuctions within the transformer
for example

first count the total length of character is 18

source -> transformer -> target
SunilBassi03101234 Ltrim(18,13) Sunil

Here total length is 18 but we need first 5 character only use that function and if else statement also

Gen1715 · Post by **Gen1715** » Mon Feb 11, 2013 12:09 am

chulett wrote: Please don't jump on the end of older posts. I've split your post out on its own and linked it back.

Sure.. and thanks for doing that.

Gen1715 · Post by **Gen1715** » Mon Feb 11, 2013 12:14 am

123gopal wrote:can be use Trim(),Ltrim() and Rtrim() fuctions within the transformer
for example

first count the total length of character is 18

source -> transformer -> target
SunilBassi03101234 Ltrim(18,13) Sunil

Here total length is 18 but we need first 5 character only use that function and if else statement also

Gopal,
This will work though, I can't use Trim or substring function in Transformer stage as this will work for less number of columns but I have a scenario where I have 900 columns, and this approach will cause maintainability issue, That's why I want to do that in CFF stage, in case you could help me.

Gen1715 · Post by **Gen1715** » Mon Feb 11, 2013 12:18 am

ray.wurlod wrote:I assume that PIX(5) actually means PIC X(5).

Because there is no OCCURS DEPENDING ON clause Phone_Grp will always occur exactly three times. Presumably the unused fields are blank-filled. Since ...

Yes your understanding is correct this is PIC X(5), Phone_Grp will occur 3 times and if field is unused source is not sending spaces for that, means less number of columns for that row, though I am not able to see the contents in here.

ray.wurlod · Post by **ray.wurlod** » Mon Feb 11, 2013 12:43 am

That should be handled OK, provided you've specified that "" is to be treated as NULL.

Gen1715 · Post by **Gen1715** » Wed Feb 13, 2013 4:40 am

I tried to explore Multiple record type option, but as it requires Record ID as "indicator" field for ALL TYPES of records, I checked with Source team and they are not able to provide any indicator field in extracts.

The CFF stage does a wonderful job of separating out multiple record formats, but only when there is an "indicator" field in ALL the records (and in the same spot) to indicate record type.

Does anyone have idea - if Parsing of records is possible in CFF stage without having any record type indicator, that means only one output link can have all record types?

FranklinE · Post by **FranklinE** » Wed Feb 13, 2013 11:15 am

Gen1715 wrote:I tried to explore Multiple record type option, but as it requires Record ID as "indicator" field for ALL TYPES of records, I checked with Source team and they are not able to provide any indicator field in extracts.

This is the critical obstacle for you. If your source team cannot provide you with consistently formatted data, you cannot use the convenient fucntionality provided by certain stages, particularly CFF.

The short answer for you here: You must analyze your data sources and come up with your own reliable method for identifying record types. Failing that, were I in your shoes, I'd go back to the source team and suggest that their design is defective (being polite about it, of course) and get management to spend the money to fix it.

If you find such a method, transformers with multiple links and a constraint on each link for the identified type is the simplest approach. I don't have CFF available (internal issue), and I use the record type to find the correct link.

In the meantime, you are faced with coding a job for each distinctive record type, and only reading those records in that job.

ray.wurlod · Post by **ray.wurlod** » Wed Feb 13, 2013 1:58 pm

Why are you even using CFF stage for this? A COBOL FD such as you have given specifies fixed-width so, unless there's a good way of handling missing columns, you have a problem.

I'd be using a server Sequential File stage (in a server Shared Container if it must be in a parallel job) which has the best facilities for handling missing columns. You can use the same table definition (imported from FD).

ray.wurlod · Post by **ray.wurlod** » Wed Feb 13, 2013 2:02 pm

Do not send private emails for responses to DSXchange questions unless you are proposing a commercial arrangement.

Especially don't send five identical emails.

Gen1715 · Post by **Gen1715** » Thu Feb 14, 2013 3:41 am

Ooops!!! I was not aware it was sent 5 times.. Sorry fot that

And thanks for being generous enough to response.

Gen1715 · Post by **Gen1715** » Thu Feb 14, 2013 4:03 am

Thanks for your response FranklinE

,

Actually I do have a method to read this file. What I can do is - Use an UNIX script and convert this No delimiter variable length text file into a delimited file, and then can deal with delimited file in DS very easily. This approach is working beautifully.

But my architect is more interested in using DataStage tool only for reading this No delimiter variable length text file, I am open for any suggestion within scope of DataStage(Not resctricted to CFF stage).

ray.wurlod · Post by **ray.wurlod** » Thu Feb 14, 2013 3:51 pm

Server Sequential File stage can do this easily. You can use this stage in a server Shared Container in a parallel job if that's what you really need to do, but maybe try it first in a server job. Scroll right in the Columns grid to find the "missing columns" rules.

DSXchange

Complex Flat File stage with Variable Length rows

Complex Flat File stage with Variable Length rows

Re: Complex Flat File stage with Variable Length rows

Re: Complex Flat File stage with Variable Length rows