Page 1 of 1

Complex Flat File stage with Variable Length rows

Posted: Sun Feb 10, 2013 12:01 am
by Gen1715
I have same scenario as Don did in this post.

My file looks something like this -
CUS_INFO
05 First_Name PIX(5).
05 Last_Name PIX(5).
05 Age PIX(5).
05 Phone_Grp Occurs 3 Times PIX(10).

Sample File -
SunilBassi031012345678901234567890123456789
SunilBassi03101234567890123456789
SunilBassi0310123456789

and record delimiter will be \n - next line character.

Sunil being First Name
Bassi being Last Name
031 being Age
0123456789 being phone number

Now I may have scenario - when one customer may have 3 phone number, one may have only two, one may have one and one may have none. and data may come as mentioned in sample file.

Max number of columns - 6
Min number of columns - 3

and number of columns may be 3,4,5 or 6 and this is no delimiter variable length text file and need to read is through CFF stage.

I have used,
used .cfd metadata
file types(s) in file option-
Record type as variable length
data type as text, ascii
record delimiter as unix new line.

Now I may read records which are having all 6 columns, but if records having less columns.. CFF not able to read those records -

Can anyone provide solution.

Posted: Sun Feb 10, 2013 9:46 am
by chulett
:!: Please don't jump on the end of older posts. I've split your post out on its own and linked it back.

Posted: Sun Feb 10, 2013 1:01 pm
by ray.wurlod
I assume that PIX(5) actually means PIC X(5).

Because there is no OCCURS DEPENDING ON clause Phone_Grp will always occur exactly three times. Presumably the unused fields are blank-filled. Since the data type is string, DataStage will happily read those strings; you can filter them out (replace with null) subsequently.

Re: Complex Flat File stage with Variable Length rows

Posted: Sun Feb 10, 2013 1:03 pm
by 123gopal
can be use Trim(),Ltrim() and Rtrim() fuctions within the transformer
for example

first count the total length of character is 18

source -> transformer -> target
SunilBassi03101234 Ltrim(18,13) Sunil

Here total length is 18 but we need first 5 character only use that function and if else statement also

Posted: Mon Feb 11, 2013 12:09 am
by Gen1715
chulett wrote::!: Please don't jump on the end of older posts. I've split your post out on its own and linked it back.
Sure.. and thanks for doing that. :)

Re: Complex Flat File stage with Variable Length rows

Posted: Mon Feb 11, 2013 12:14 am
by Gen1715
123gopal wrote:can be use Trim(),Ltrim() and Rtrim() fuctions within the transformer
for example

first count the total length of character is 18

source -> transformer -> target
SunilBassi03101234 Ltrim(18,13) Sunil

Here total length is 18 but we need first 5 character only use that function and if else statement also
Gopal,
This will work though, I can't use Trim or substring function in Transformer stage as this will work for less number of columns but I have a scenario where I have 900 columns, and this approach will cause maintainability issue, That's why I want to do that in CFF stage, in case you could help me.

Posted: Mon Feb 11, 2013 12:18 am
by Gen1715
ray.wurlod wrote:I assume that PIX(5) actually means PIC X(5).

Because there is no OCCURS DEPENDING ON clause Phone_Grp will always occur exactly three times. Presumably the unused fields are blank-filled. Since ...
Yes your understanding is correct this is PIC X(5), Phone_Grp will occur 3 times and if field is unused source is not sending spaces for that, means less number of columns for that row, though I am not able to see the contents in here.

Posted: Mon Feb 11, 2013 12:43 am
by ray.wurlod
That should be handled OK, provided you've specified that "" is to be treated as NULL.

Posted: Wed Feb 13, 2013 4:40 am
by Gen1715
I tried to explore Multiple record type option, but as it requires Record ID as "indicator" field for ALL TYPES of records, I checked with Source team and they are not able to provide any indicator field in extracts.

The CFF stage does a wonderful job of separating out multiple record formats, but only when there is an "indicator" field in ALL the records (and in the same spot) to indicate record type.

Does anyone have idea - if Parsing of records is possible in CFF stage without having any record type indicator, that means only one output link can have all record types?

Posted: Wed Feb 13, 2013 11:15 am
by FranklinE
Gen1715 wrote:I tried to explore Multiple record type option, but as it requires Record ID as "indicator" field for ALL TYPES of records, I checked with Source team and they are not able to provide any indicator field in extracts.
This is the critical obstacle for you. If your source team cannot provide you with consistently formatted data, you cannot use the convenient fucntionality provided by certain stages, particularly CFF.

The short answer for you here: You must analyze your data sources and come up with your own reliable method for identifying record types. Failing that, were I in your shoes, I'd go back to the source team and suggest that their design is defective (being polite about it, of course) and get management to spend the money to fix it.

If you find such a method, transformers with multiple links and a constraint on each link for the identified type is the simplest approach. I don't have CFF available (internal issue), and I use the record type to find the correct link.

In the meantime, you are faced with coding a job for each distinctive record type, and only reading those records in that job.

Posted: Wed Feb 13, 2013 1:58 pm
by ray.wurlod
Why are you even using CFF stage for this? A COBOL FD such as you have given specifies fixed-width so, unless there's a good way of handling missing columns, you have a problem.

I'd be using a server Sequential File stage (in a server Shared Container if it must be in a parallel job) which has the best facilities for handling missing columns. You can use the same table definition (imported from FD).

Posted: Wed Feb 13, 2013 2:02 pm
by ray.wurlod
Do not send private emails for responses to DSXchange questions unless you are proposing a commercial arrangement.

Especially don't send five identical emails.

Posted: Thu Feb 14, 2013 3:41 am
by Gen1715
Ooops!!! I was not aware it was sent 5 times.. Sorry fot that :)
And thanks for being generous enough to response.

Posted: Thu Feb 14, 2013 4:03 am
by Gen1715
Thanks for your response FranklinE :) ,

Actually I do have a method to read this file. What I can do is - Use an UNIX script and convert this No delimiter variable length text file into a delimited file, and then can deal with delimited file in DS very easily. This approach is working beautifully.

But my architect is more interested in using DataStage tool only for reading this No delimiter variable length text file, I am open for any suggestion within scope of DataStage(Not resctricted to CFF stage).

Posted: Thu Feb 14, 2013 3:51 pm
by ray.wurlod
Server Sequential File stage can do this easily. You can use this stage in a server Shared Container in a parallel job if that's what you really need to do, but maybe try it first in a server job. Scroll right in the Columns grid to find the "missing columns" rules.