Dealing with Occurs in .cfd files using CFF stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Dealing with Occurs in .cfd files using CFF stage

Post by vivekgadwal »

Hello,

I have COBOL Records with occurs in them. I have COBOL record like this:

05 RECORD-03 REDEFINES RECORD-BODY.
10 03-REPL-PARTS OCCURS 06 TIMES
INDEXED BY PART-INDEX.
15 03-REPL-PART PIC X(20).
15 03-REPL-PART-QTY PIC 9(02).
.......so on.

In this way, I have three more records with occurs in them. The goal is to load these records using a CFF stage(DataStage 7.0 Server Edition plug-in) into tables on DB2 by first loading them into sequential file. Can anyone suggest a way to load these occurs into the tables?
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

If you read the manual or on-line help, you will learn that the CFF stage at this release does not support OCCURS DEPENDING ON or OCCURS INDEXED BY.

I also note in your FD that the field PART-INDEX is not mentioned. Presumably it occurs later and you omitted to show it.

Can you have the generators of the data produce a fixed OCCURS 6 TIMES, filling the un-needed ones with anything? Then your job can use the value in PART-INDEX to parse the required fields out.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

ray.wurlod wrote:Welcome aboard.

If you read the manual or on-line help, you will learn that the CFF stage at this release does not support OCCURS DEPENDING ON or OCCURS INDEXED BY.

I also note in your FD that the field PART-INDEX is not mentioned. Presumably it occurs later and you omitted to show it.

Can you have the generators of the data produce a fixed OCCURS 6 TIMES, filling the un-needed ones with anything? Then your job can use the value in PART-INDEX to parse the required fields out.
Thanks Mr. Wurlord,

As I was thinking yesterday, I came to a solution. I would appreciate you if you can tell me if I am going on the right track or not. The idea is to put those occurs columns/fields as rows in a seq. file and then load them on to the table. What I did is, I have created a separate job for this scenario and I have taken 6 output links from the transformer stage which gets it input from the CFF stage. The figure 6 is because there are 6 occurs. Afterwards, these 6 links were fed into a Link collector stage and from there onto the Sequential file stage. The question is, is this a better approach to take or is there a better one out there? The job would be crazier when there would be 30 occurs, which is the case in RECORD-05 which I didn't show you in the example code that I sent you.

Please do reply me at your convenience. Thanks again for your earlier post!
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

While your approach would work, there is a limit to the number of links that a Transformer stage can support - from memory I think it's 127. Plus you still have the problem about what to do to determine whether the Nth occurrence exists. Presumably you will have a constraint based on the value of PART-INDEX.

If you can get a fixed OCCURS supplied, you can do the whole thing in a CFF and Transformer stage without the need for a Link Collector.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

ray.wurlod wrote:While your approach would work, there is a limit to the number of links that a Transformer stage can support - from memory I think it's 127. Plus you still have the problem about what to do to determine whether the Nth occurrence exists. Presumably you will have a constraint based on the value of PART-INDEX.

If you can get a fixed OCCURS supplied, you can do the whole thing in a CFF and Transformer stage without the need for a Link Collector.
Thanks for the post Mr. Wurlod,

I have to confess, I am a little less experienced in using/dealing with COBOL files, CFF stage etc. The un-needed OCCURS have zeroes coming with them and the instance which matters has legitimate values in it. So, I think they are fixed OCCURS. Could you please explain, how I can use the CFF stage itself to deal with it? (Oh...and by the way, there is no logic being implemented in the Transformer stage. It is straight pass to the Sequential file stage from there!)
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
crouse
Charter Member
Charter Member
Posts: 204
Joined: Sun Oct 05, 2003 12:59 pm
Contact:

Post by crouse »

Ray, are you sure INDEXED BY is not allowed in CFF on Server?

Also, I don't think "INDEXED BY" OCCURS are variable, like DEPENDING ON is. I believe INDEXED BY PART-INDEX allows you to access a particular occurance

Poster, try removing the INDEXED BY phrase from the CFD. Then, pick up the manual to see how to import the CFD and use it in the CFF stage.

As Ray states, and if you can remove the phrase from the CFD and all the occurances are populated, you can use the CFF to do all you need.

Here a snippet from IBM on indexing and occurs (not that it's useful for DataStage development, but informational nontheless):

Subscripting using index-names (indexing)

Indexing allows such operations as table searching and manipulating specific items. To use indexing, you associate one or more index-names with an item whose data description entry contains an OCCURS clause. An index associated with an index-name acts as a subscript, and its value corresponds to an occurrence number for the item to which the index-name is associated.

The INDEXED BY phrase, by which the index-name is identified and associated with its table, is an optional part of the OCCURS clause. There is no separate entry to describe the index associated with index-name. At run time, the contents of the index corresponds to an occurrence number for that specific dimension of the table with which the index is associated.

The initial value of an index at run time is undefined, and the index must be initialized before it is used as a subscript. An initial value is assigned to an index with one of the following:

* The PERFORM statement with the VARYING phrase
* The SEARCH statement with the ALL phrase
* The SET statement

The use of an integer or data-name as a subscript that references a table element or an item within a table element does not cause the alteration of any index associated with that table.

An index-name can be used to reference any table. However, the element length of the table being referenced and of the table that the index-name is associated with should match. Otherwise, the reference will not be to the same table element in each table, and you might get runtime errors.

Data that is arranged in the form of a table is often searched. The SEARCH statement provides facilities for producing serial and nonserial searches. It is used to search for a table element that satisfies a specific condition and to adjust the value of the associated index to indicate that table element.

To be valid during execution, an index value must correspond to a table element occurrence of neither less than one, nor greater than the highest permissible occurrence number.
Craig Rouse
Griffin Resouces, Inc
www.griffinresources.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Cool, thanks Craig. The CFF stage gives you the opportunity to specify to "flatten" (generate six separate columns) or to "normalize" (generate six separate rows). The documentation (manual or on-line help) explains how these are done.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
girish.rupkumar
Participant
Posts: 11
Joined: Thu Sep 13, 2007 12:56 am

Re: Dealing with Occurs in .cfd files using CFF stage

Post by girish.rupkumar »

Hi...

As Ray is telling, you have to flatten your input file. Means, you need to update your metadata by creating new columns. (As many times as it is occuring). I have faced the same challenge before and I got flattened input file and the new metadata.....
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

crouse wrote:Ray, are you sure INDEXED BY is not allowed in CFF on Server?

Also, I don't think "INDEXED BY" OCCURS are variable, like DEPENDING ON is. I believe INDEXED BY PART-INDEX allows you to access a particular occurance

Poster, try removing the INDEXED BY phrase from the CFD. Then, pick up the manual to see how to import the CFD and use it in the CFF stage.

As Ray states, and if you can remove the phrase from the CFD and all the occurances are populated, you can use the CFF to do all you need.

Here a snippet from IBM on indexing and occurs (not that it's useful for DataStage development, but informational nontheless):

Subscripting using index-names (indexing)

Indexing allows such operations as table searching and manipulating specific items. To use indexing, you associate one or more index-names with an item whose data description entry contains an OCCURS clause. An index associated with an index-name acts as a subscript, and its value corresponds to an occurrence number for the item to which the index-name is associated.

The INDEXED BY phrase, by which the index-name is identified and associated with its table, is an optional part of the OCCURS clause. There is no separate entry to describe the index associated with index-name. At run time, the contents of the index corresponds to an occurrence number for that specific dimension of the table with which the index is associated.

The initial value of an index at run time is undefined, and the index must be initialized before it is used as a subscript. An initial value is assigned to an index with one of the following:

* The PERFORM statement with the VARYING phrase
* The SEARCH statement with the ALL phrase
* The SET statement

The use of an integer or data-name as a subscript that references a table element or an item within a table element does not cause the alteration of any index associated with that table.

An index-name can be used to reference any table. However, the element length of the table being referenced and of the table that the index-name is associated with should match. Otherwise, the reference will not be to the same table element in each table, and you might get runtime errors.

Data that is arranged in the form of a table is often searched. The SEARCH statement provides facilities for producing serial and nonserial searches. It is used to search for a table element that satisfies a specific condition and to adjust the value of the associated index to indicate that table element.

To be valid during execution, an index value must correspond to a table element occurrence of neither less than one, nor greater than the highest permissible occurrence number.
Thank you Craig, Girish and Ray for the responses.
It has helped me immensely with the usage of the stage and I have flattened the OCCURS using the CFF stage. I have finished the design and it is running fine. As I was mentioning, I have flattened all the occurs and treated them as separate rows. Now, I have taken the data and put them in a sequential file(.dat format). I have used 6 output links from the transformer stage, as I mentioned earlier, due to the 6 occurs that are taking place. Is there a better way to take the data and put that into a flat file(fixed width) without using as many links from the transformer. The reason why I am asking you guys this question is, if there are more than 127 occurs in a record, then transformer cannot accomodate as many links. Even 30-40 links might cause performance issues! So, kindly, suggest a better way if you know it.

Thank yoy all once again.
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Just have the CFF stage "normalize" the repeating group. The CFF stage itself can generate the multiple rows. Then run the single output through a Transformer stage to filter out those occurrences that have the "empty" value - did you say it was all zeroes?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

ray.wurlod wrote:Just have the CFF stage "normalize" the repeating group. The CFF stage itself can generate the multiple rows. Then run the single output through a Transformer stage to filter out those occurrences that have the "empty" value - did you say it was all zeroes?
Yes, we have zeroes coming in...I am enclosing sample data and we have assigned rank no to identify the occurs in them. Please go through it.

12744200708294223442234900010 1DS 250.00COURTESY DELI
12744200708294223442234900010 2 0.00
12744200708294223442234900010 3 0.00
12745200708294223442802000010 1DS 250.00COURTESY DELIVERY
12745200708294223442802000010 2 0.00
12745200708294223442802000010 3 0.00

....so on....

In this above example data, you can notice repetitive 1, 2, 3 after the first string of data (12744...90010 "1"DS). The figure enclosed in quotes is the rank no I have assigned to let us know that there are three occurences of this data. And it is for this very reason, I have to use three separate links from the transformer.

NOTE: The data in the example is the output of one of the jobs I did which loads the data into a fixed width sequential file from a COBOL record.
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is no repeating group (no OCCURS) in this data. By creating multiple outputs and running these into a Link Collector you are producing exactly the input file.

Please create a written (that is, English) specification of the true structure of the input file, and the desired output file structure.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

ray.wurlod wrote:There is no repeating group (no OCCURS) in this data. By creating multiple outputs and running these into a Link Collector you are producing exactly the input file.

Please create a written (that is, English) specification of the true structure of the input file, and the desired output file structure.
Dear Mr. Wurlod,

Thank you for the response. Here is the written specification of the input file (.cfd file):

01 RECORDS.
05 Record-Key.
10 REPORT-NO PIC X(05).
10 REPORT-DATE PIC X(08).
10 SVC-DLR-CODE PIC X(05).
10 REGION-CODE PIC X(02).
10 DISTRICT-CODE PIC X(02).
10 EXTERNAL-REF-NO PIC X(10).
10 CLM-PRTN-CODE PIC X(05).
10 CLAIM-NO PIC X(07).
10 RECORD-TYPE PIC X(02).
05 RECORD-BODY PIC X(224).
...
I am enclosing one of the occurs (.cfd)
05 RECORD-03 REDEFINES RECORD-BODY.
10 03-REPL-PART OCCURS 06 TIMES
INDEXED BY PART-INDEX.
15 REPL-PART PIC X(20).
15 REPL-PART-QTY PIC 9(02).
15 REPL-PART-PRICE PIC 9(05)V99.
15 REPL-PART-EXT-COST PIC 9(05)V99.
10 03-FILLER
...
Specification of the input file: We receive the data file from Mainframe and it is a cobol file. As you have seen above, the file is called RECORDS and there are 10 records in the file (RECORD-01, RECORD-02..., RECORD-10) out of which there are four RECORDS which have occurs in them. Our objective is to load them into a DB2 tables for Analyzing purposes. There is a staging area in between and it is nothing but straight load. We are thinking of performing validations after staging the data into the tables. Therefore, it is straight load into the staging tables. The staging table structure looks like this for the occurs record I have shown above:

REPORT_NO(The first five are taken from RECORD-KEY)
REPORT_DATE
SVC-DLR-CODE
CLM_PRTN_CODE
CLM_NO
RANK_NO (Generated using ETL to denote the occurs in the file)
REPL_PART_NO (The bottom four are taken from RECORD-03 of the ex)
REPL_PART_QTY
REPL_PART_PRICE
REPL_PART_EXT_COST

This is an example of what we are trying to do. We need the occurs even if they are zeroes, in the table and that is the reason we have RANK_NO column. Example data:
12744200708294223460A 42234900002 034425001020 0100491130049113 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(NOTE: This is a single line in the data file)
This data should be loaded as it is into the table by flattening the occurs and converting them into rows instead of columns. As you can notice, there are zeroes in the rest of the array structure.

I am sorry to bother you with this and these posts have been a good learning experience for me with regards to the OCCURS and stuff. Thank you again.
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Before we go on, have you read the manual for the CFF stage (cff.pdf)? There are examples and discussions there that may quench some of your thirst for knowledge, and save me transcribing slabs of that document!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

ray.wurlod wrote:Before we go on, have you read the manual for the CFF stage (cff.pdf)? There are examples and discussions there that may quench some of your thirst for knowledge, and save me transcribing slabs of that document!
:) I have gone through the doc and I have understood what you guys were trying to tell me. In fact I have applied it that way and ran the job. It went on fine. I have posted the earlier post in haste and after implementing it the way it was suggested, I realized that it was a waste of time I did. I apologize for it.

Thanks for your time Mr. Wurlod.
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
Post Reply