Page 1 of 1

Handling hierarchical file in DataStage 6.0

Posted: Tue Feb 10, 2004 5:38 pm
by kusuma
We are using DataStage
6.0. Currently we are facing problem with a particular type of Source
files. These are Hierarical files in Unix. Originally they are XML files
but the source system which is sending these files are converting them
into Unix files with pipe delimited. I am attaching a sample file with few
records.

Record Description : The record starts with TroubleHeaderInfo and has 1.
TroubleActivityInfo, 2. LRDetailsInfo, 3. LRCustomerInfo, 4.
LRCentralOfficeInfo, 5. LRFacilityInfo, 6. LRServiceInfo as subtitles. And
each of this subtitle can repeat any number of times as you can see in the
sample file, TroubleActivityInfo has repeated many times in each record.
Moreover, each subtitle has a different number of columns. For example :
TroubleActivityInfo has 81 columns, LRDetailsInfo has 18 columns,
LRCustomerInfo has 14 columns,....... And every subtile starts with two
astriks and ends with a carriage return in Unix.


I would like to understand if there is a possibility for DataStage to
handle such files. Or do we have to do some preprocessing on the Souce
files. Can you also tell me, if it was a XML file, how would we handle it.
And which is the best way to handle it. Please let me know the
possibilities ASAP because we have to work on the design of the jobs

For example:
>**HeadedRecord|2001|7329063231|POMS|20|NE
>**TroubleInfo|2001|7329063231|..............................................80columns
>**TroubleInfo|2001|7329063231|..............................................80columns
>**TroubleInfo|2001|7329063231|..............................................80columns
>**TroubleInfo|2001|7329063231|..............................................80columns
>**LR-CustInfo|2001|7329063231|..............................................20
>
>columns
>**LR-ServiceInfo|2001|7329063231|.......................................10
>columns
>**HeadedRecord|2002|9081063231|POMS|50|NY
>**TroubleInfo|2002|9081063231|..............................................80columns
>**TroubleInfo|2002|9081063231|..............................................80columns
>**LR-CustInfo|2001|9081063231|..............................................20
>
>columns
>**LR-ServiceInfo|2001|9081063231|.......................................10
>columns
>
>This file will be loading many tables. Is it possible to handle this kind
>of file by CFF stage

Posted: Wed Feb 11, 2004 8:21 am
by Klaus Schaefer
Kusuma,

let me guess two options on this file:

1. You could use CFF. However, CFF expects fix length records. Since it is a delimited file you have to take care of it, at least you need to define "fields" for the delimiters.

2. Use a regular flat file stage. Read all data into a single "data" field. In the next transformer splitt the records acording the record type. Then use row merger/splitter to apply the metadata for further processing (or use the old subtring method, which is a little bit laborious...)

Klaus

Posted: Wed Feb 11, 2004 8:32 am
by chulett
Unfortunately, in Version 6 there is no Row Splitter/Merger, that comes with 7 and (from what I understand) could indeed help with your situation.

You could emulate that behaviour with substringing as Klaus mentions. Kinda laborious, yes, but it can be made to work.

Re: Handling hierarchical file in DataStage 6.0

Posted: Wed Feb 11, 2004 12:11 pm
by kusuma
Klaus and Craig,

Thank you very much.

We were exploring the options for splitting the file through SyncSort or DataStage which ever is faster. Can you please give me little more details about splitting the file through this substring method.

kusuma
kusuma wrote:We are using DataStage
6.0. Currently we are facing problem with a particular type of Source
files. These are Hierarical files in Unix. Originally they are XML files
but the source system which is sending these files are converting them
into Unix files with pipe delimited. I am attaching a sample file with few
records.

Record Description : The record starts with TroubleHeaderInfo and has 1.
TroubleActivityInfo, 2. LRDetailsInfo, 3. LRCustomerInfo, 4.
LRCentralOfficeInfo, 5. LRFacilityInfo, 6. LRServiceInfo as subtitles. And
each of this subtitle can repeat any number of times as you can see in the
sample file, TroubleActivityInfo has repeated many times in each record.
Moreover, each subtitle has a different number of columns. For example :
TroubleActivityInfo has 81 columns, LRDetailsInfo has 18 columns,
LRCustomerInfo has 14 columns,....... And every subtile starts with two
astriks and ends with a carriage return in Unix.


I would like to understand if there is a possibility for DataStage to
handle such files. Or do we have to do some preprocessing on the Souce
files. Can you also tell me, if it was a XML file, how would we handle it.
And which is the best way to handle it. Please let me know the
possibilities ASAP because we have to work on the design of the jobs

For example:
>**HeadedRecord|2001|7329063231|POMS|20|NE
>**TroubleInfo|2001|7329063231|..............................................80columns
>**TroubleInfo|2001|7329063231|..............................................80columns
>**TroubleInfo|2001|7329063231|..............................................80columns
>**TroubleInfo|2001|7329063231|..............................................80columns
>**LR-CustInfo|2001|7329063231|..............................................20
>
>columns
>**LR-ServiceInfo|2001|7329063231|.......................................10
>columns
>**HeadedRecord|2002|9081063231|POMS|50|NY
>**TroubleInfo|2002|9081063231|..............................................80columns
>**TroubleInfo|2002|9081063231|..............................................80columns
>**LR-CustInfo|2001|9081063231|..............................................20
>
>columns
>**LR-ServiceInfo|2001|9081063231|.......................................10
>columns
>
>This file will be loading many tables. Is it possible to handle this kind
>of file by CFF stage

Posted: Wed Feb 11, 2004 3:32 pm
by ray.wurlod
Simply put, you can read the file with a Sequential File stage set to read 80 columns, pipe delimited.
In the Columns grid in the Sequential File stage scroll to the right and change the missing columns rules for columns 11 through 80 from Error to Replace (work from the bottom up; there is a hierarchy of error handling here).
Define all columns as VarChar, and you may as well call then Field001 through Field080, the DataStage defaults, at this point.
In a Transformer stage fed by this Sequential File stage, segregate the header, detail LR_ServiceInfo and LR_CustInfo records. Use meaningful column names on the output links and you've pretty much got it solved.

Posted: Fri Feb 13, 2004 6:55 pm
by kusuma
Thank you Ray,

Although it is simple, it is a brilliant one. It is working.

kusuma