Handling hierarchical file in DataStage 6.0

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
kusuma
Participant
Posts: 13
Joined: Fri Sep 05, 2003 7:36 am
Location: India

Handling hierarchical file in DataStage 6.0

Post by kusuma »

We are using DataStage
6.0. Currently we are facing problem with a particular type of Source
files. These are Hierarical files in Unix. Originally they are XML files
but the source system which is sending these files are converting them
into Unix files with pipe delimited. I am attaching a sample file with few
records.

Record Description : The record starts with TroubleHeaderInfo and has 1.
TroubleActivityInfo, 2. LRDetailsInfo, 3. LRCustomerInfo, 4.
LRCentralOfficeInfo, 5. LRFacilityInfo, 6. LRServiceInfo as subtitles. And
each of this subtitle can repeat any number of times as you can see in the
sample file, TroubleActivityInfo has repeated many times in each record.
Moreover, each subtitle has a different number of columns. For example :
TroubleActivityInfo has 81 columns, LRDetailsInfo has 18 columns,
LRCustomerInfo has 14 columns,....... And every subtile starts with two
astriks and ends with a carriage return in Unix.


I would like to understand if there is a possibility for DataStage to
handle such files. Or do we have to do some preprocessing on the Souce
files. Can you also tell me, if it was a XML file, how would we handle it.
And which is the best way to handle it. Please let me know the
possibilities ASAP because we have to work on the design of the jobs

For example:
>**HeadedRecord|2001|7329063231|POMS|20|NE
>**TroubleInfo|2001|7329063231|..............................................80columns
>**TroubleInfo|2001|7329063231|..............................................80columns
>**TroubleInfo|2001|7329063231|..............................................80columns
>**TroubleInfo|2001|7329063231|..............................................80columns
>**LR-CustInfo|2001|7329063231|..............................................20
>
>columns
>**LR-ServiceInfo|2001|7329063231|.......................................10
>columns
>**HeadedRecord|2002|9081063231|POMS|50|NY
>**TroubleInfo|2002|9081063231|..............................................80columns
>**TroubleInfo|2002|9081063231|..............................................80columns
>**LR-CustInfo|2001|9081063231|..............................................20
>
>columns
>**LR-ServiceInfo|2001|9081063231|.......................................10
>columns
>
>This file will be loading many tables. Is it possible to handle this kind
>of file by CFF stage
Klaus Schaefer
Participant
Posts: 94
Joined: Wed May 08, 2002 8:44 am
Location: Germany
Contact:

Post by Klaus Schaefer »

Kusuma,

let me guess two options on this file:

1. You could use CFF. However, CFF expects fix length records. Since it is a delimited file you have to take care of it, at least you need to define "fields" for the delimiters.

2. Use a regular flat file stage. Read all data into a single "data" field. In the next transformer splitt the records acording the record type. Then use row merger/splitter to apply the metadata for further processing (or use the old subtring method, which is a little bit laborious...)

Klaus
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Unfortunately, in Version 6 there is no Row Splitter/Merger, that comes with 7 and (from what I understand) could indeed help with your situation.

You could emulate that behaviour with substringing as Klaus mentions. Kinda laborious, yes, but it can be made to work.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kusuma
Participant
Posts: 13
Joined: Fri Sep 05, 2003 7:36 am
Location: India

Re: Handling hierarchical file in DataStage 6.0

Post by kusuma »

Klaus and Craig,

Thank you very much.

We were exploring the options for splitting the file through SyncSort or DataStage which ever is faster. Can you please give me little more details about splitting the file through this substring method.

kusuma
kusuma wrote:We are using DataStage
6.0. Currently we are facing problem with a particular type of Source
files. These are Hierarical files in Unix. Originally they are XML files
but the source system which is sending these files are converting them
into Unix files with pipe delimited. I am attaching a sample file with few
records.

Record Description : The record starts with TroubleHeaderInfo and has 1.
TroubleActivityInfo, 2. LRDetailsInfo, 3. LRCustomerInfo, 4.
LRCentralOfficeInfo, 5. LRFacilityInfo, 6. LRServiceInfo as subtitles. And
each of this subtitle can repeat any number of times as you can see in the
sample file, TroubleActivityInfo has repeated many times in each record.
Moreover, each subtitle has a different number of columns. For example :
TroubleActivityInfo has 81 columns, LRDetailsInfo has 18 columns,
LRCustomerInfo has 14 columns,....... And every subtile starts with two
astriks and ends with a carriage return in Unix.


I would like to understand if there is a possibility for DataStage to
handle such files. Or do we have to do some preprocessing on the Souce
files. Can you also tell me, if it was a XML file, how would we handle it.
And which is the best way to handle it. Please let me know the
possibilities ASAP because we have to work on the design of the jobs

For example:
>**HeadedRecord|2001|7329063231|POMS|20|NE
>**TroubleInfo|2001|7329063231|..............................................80columns
>**TroubleInfo|2001|7329063231|..............................................80columns
>**TroubleInfo|2001|7329063231|..............................................80columns
>**TroubleInfo|2001|7329063231|..............................................80columns
>**LR-CustInfo|2001|7329063231|..............................................20
>
>columns
>**LR-ServiceInfo|2001|7329063231|.......................................10
>columns
>**HeadedRecord|2002|9081063231|POMS|50|NY
>**TroubleInfo|2002|9081063231|..............................................80columns
>**TroubleInfo|2002|9081063231|..............................................80columns
>**LR-CustInfo|2001|9081063231|..............................................20
>
>columns
>**LR-ServiceInfo|2001|9081063231|.......................................10
>columns
>
>This file will be loading many tables. Is it possible to handle this kind
>of file by CFF stage
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Simply put, you can read the file with a Sequential File stage set to read 80 columns, pipe delimited.
In the Columns grid in the Sequential File stage scroll to the right and change the missing columns rules for columns 11 through 80 from Error to Replace (work from the bottom up; there is a hierarchy of error handling here).
Define all columns as VarChar, and you may as well call then Field001 through Field080, the DataStage defaults, at this point.
In a Transformer stage fed by this Sequential File stage, segregate the header, detail LR_ServiceInfo and LR_CustInfo records. Use meaningful column names on the output links and you've pretty much got it solved.
Last edited by ray.wurlod on Sat Feb 14, 2004 4:31 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kusuma
Participant
Posts: 13
Joined: Fri Sep 05, 2003 7:36 am
Location: India

Post by kusuma »

Thank you Ray,

Although it is simple, it is a brilliant one. It is working.

kusuma
Post Reply