Complex Flat File stage with Variable Length rows

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Gen1715
Participant
Posts: 25
Joined: Mon Nov 05, 2012 5:42 am
Location: India

Complex Flat File stage with Variable Length rows

Post by Gen1715 »

I have same scenario as Don did in this post.

My file looks something like this -
CUS_INFO
05 First_Name PIX(5).
05 Last_Name PIX(5).
05 Age PIX(5).
05 Phone_Grp Occurs 3 Times PIX(10).

Sample File -
SunilBassi031012345678901234567890123456789
SunilBassi03101234567890123456789
SunilBassi0310123456789

and record delimiter will be \n - next line character.

Sunil being First Name
Bassi being Last Name
031 being Age
0123456789 being phone number

Now I may have scenario - when one customer may have 3 phone number, one may have only two, one may have one and one may have none. and data may come as mentioned in sample file.

Max number of columns - 6
Min number of columns - 3

and number of columns may be 3,4,5 or 6 and this is no delimiter variable length text file and need to read is through CFF stage.

I have used,
used .cfd metadata
file types(s) in file option-
Record type as variable length
data type as text, ascii
record delimiter as unix new line.

Now I may read records which are having all 6 columns, but if records having less columns.. CFF not able to read those records -

Can anyone provide solution.
Gaurav Martha
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:!: Please don't jump on the end of older posts. I've split your post out on its own and linked it back.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I assume that PIX(5) actually means PIC X(5).

Because there is no OCCURS DEPENDING ON clause Phone_Grp will always occur exactly three times. Presumably the unused fields are blank-filled. Since the data type is string, DataStage will happily read those strings; you can filter them out (replace with null) subsequently.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
123gopal
Participant
Posts: 16
Joined: Sat Feb 09, 2013 11:41 am
Location: chennai

Re: Complex Flat File stage with Variable Length rows

Post by 123gopal »

can be use Trim(),Ltrim() and Rtrim() fuctions within the transformer
for example

first count the total length of character is 18

source -> transformer -> target
SunilBassi03101234 Ltrim(18,13) Sunil

Here total length is 18 but we need first 5 character only use that function and if else statement also
mgplk
Gen1715
Participant
Posts: 25
Joined: Mon Nov 05, 2012 5:42 am
Location: India

Post by Gen1715 »

chulett wrote::!: Please don't jump on the end of older posts. I've split your post out on its own and linked it back.
Sure.. and thanks for doing that. :)
Gaurav Martha
Gen1715
Participant
Posts: 25
Joined: Mon Nov 05, 2012 5:42 am
Location: India

Re: Complex Flat File stage with Variable Length rows

Post by Gen1715 »

123gopal wrote:can be use Trim(),Ltrim() and Rtrim() fuctions within the transformer
for example

first count the total length of character is 18

source -> transformer -> target
SunilBassi03101234 Ltrim(18,13) Sunil

Here total length is 18 but we need first 5 character only use that function and if else statement also
Gopal,
This will work though, I can't use Trim or substring function in Transformer stage as this will work for less number of columns but I have a scenario where I have 900 columns, and this approach will cause maintainability issue, That's why I want to do that in CFF stage, in case you could help me.
Gaurav Martha
Gen1715
Participant
Posts: 25
Joined: Mon Nov 05, 2012 5:42 am
Location: India

Post by Gen1715 »

ray.wurlod wrote:I assume that PIX(5) actually means PIC X(5).

Because there is no OCCURS DEPENDING ON clause Phone_Grp will always occur exactly three times. Presumably the unused fields are blank-filled. Since ...
Yes your understanding is correct this is PIC X(5), Phone_Grp will occur 3 times and if field is unused source is not sending spaces for that, means less number of columns for that row, though I am not able to see the contents in here.
Gaurav Martha
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That should be handled OK, provided you've specified that "" is to be treated as NULL.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Gen1715
Participant
Posts: 25
Joined: Mon Nov 05, 2012 5:42 am
Location: India

Post by Gen1715 »

I tried to explore Multiple record type option, but as it requires Record ID as "indicator" field for ALL TYPES of records, I checked with Source team and they are not able to provide any indicator field in extracts.

The CFF stage does a wonderful job of separating out multiple record formats, but only when there is an "indicator" field in ALL the records (and in the same spot) to indicate record type.

Does anyone have idea - if Parsing of records is possible in CFF stage without having any record type indicator, that means only one output link can have all record types?
Gaurav Martha
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Gen1715 wrote:I tried to explore Multiple record type option, but as it requires Record ID as "indicator" field for ALL TYPES of records, I checked with Source team and they are not able to provide any indicator field in extracts.
This is the critical obstacle for you. If your source team cannot provide you with consistently formatted data, you cannot use the convenient fucntionality provided by certain stages, particularly CFF.

The short answer for you here: You must analyze your data sources and come up with your own reliable method for identifying record types. Failing that, were I in your shoes, I'd go back to the source team and suggest that their design is defective (being polite about it, of course) and get management to spend the money to fix it.

If you find such a method, transformers with multiple links and a constraint on each link for the identified type is the simplest approach. I don't have CFF available (internal issue), and I use the record type to find the correct link.

In the meantime, you are faced with coding a job for each distinctive record type, and only reading those records in that job.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why are you even using CFF stage for this? A COBOL FD such as you have given specifies fixed-width so, unless there's a good way of handling missing columns, you have a problem.

I'd be using a server Sequential File stage (in a server Shared Container if it must be in a parallel job) which has the best facilities for handling missing columns. You can use the same table definition (imported from FD).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do not send private emails for responses to DSXchange questions unless you are proposing a commercial arrangement.

Especially don't send five identical emails.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Gen1715
Participant
Posts: 25
Joined: Mon Nov 05, 2012 5:42 am
Location: India

Post by Gen1715 »

Ooops!!! I was not aware it was sent 5 times.. Sorry fot that :)
And thanks for being generous enough to response.
Gaurav Martha
Gen1715
Participant
Posts: 25
Joined: Mon Nov 05, 2012 5:42 am
Location: India

Post by Gen1715 »

Thanks for your response FranklinE :) ,

Actually I do have a method to read this file. What I can do is - Use an UNIX script and convert this No delimiter variable length text file into a delimited file, and then can deal with delimited file in DS very easily. This approach is working beautifully.

But my architect is more interested in using DataStage tool only for reading this No delimiter variable length text file, I am open for any suggestion within scope of DataStage(Not resctricted to CFF stage).
Gaurav Martha
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Server Sequential File stage can do this easily. You can use this stage in a server Shared Container in a parallel job if that's what you really need to do, but maybe try it first in a server job. Scroll right in the Columns grid to find the "missing columns" rules.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply