Extracting data from flat files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Extracting data from flat files

Post by pradkumar »

Hi

I am trying to extract the raw data from a flat file. This is a fixed width (column) file. I am using a sequential file stage to extract.

Now how do I specify that it is fixed width column (like in server edition we have this option in the format tab). I donot know how this can be accomplished in parallel edition.

Thanks for all the help
Pradeep Kumar
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Go to the Format tab in the Sequential file stage and right click on the record level and select format as and you will find it there. :wink:
Kris

Where's the "Any" key?-Homer Simpson
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Hi Kris

Thanks for the reply

Sorry to trouble you. I am a novice to data stage and learning and I worked on Server and now trying on parallel.
Actually I did use format tab and set the Record delimiter to DOS format. and also used Record length to be fixed. But it did not work out.


My raw data does not have any delimiters. I was able to do it in server setting fixed width column option.


Thanks
Pradeep Kumar
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

It's pretty easy to find and hard to miss. Here again. Once you go to format tab in the Sequential file stage you will properties window on the left hand side with following Properties ---Record Level,Field defaults, Type Defaults. Right click on any one of them. Select or place your mouse on Format as option so that a new context menu appears in which you will find the attribute "Fixed Width Records" which is what you would need to select.

In most cases, fixed width files don't have a delimiter option and you don't need that property in your case. So remove it from there.
Kris

Where's the "Any" key?-Homer Simpson
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Forgive me for my ignorance in this regard.

I did try the above mentioned method before I posted here and could not get it to work - I got an error stating unable to initialize importer - ""record_length fixed" (no length given) and record field is variable length""

First off: Is a fixed width record similar to a fixed-width record field (column) ?..........because in my case the file I am trying to extract from has Fixed-width columns i.e., record fields have fixed width

2. If there is no option to specify "fixed-width column" option like in the Server edition ; does this mean I should manually give each column a fixed width (by going to column tab>edit row>string type>field width) ?

I do apologize if this does not make much sense but whats normally a very simple procedure in the Server edition just does not seem to work for me in parallel. And this has made me try a lot of wierd methods (like question 2) :(
Pradeep Kumar
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

pradkumar wrote: First off: Is a fixed width record similar to a fixed-width record field (column) ?..........because in my case the file I am trying to extract from has Fixed-width columns i.e., record fields have fixed width
It's one and the same. A fixed width file has records whose length is fixed and in almost all cases followed by a LineTerminator. If you have set Record Lenght attribute to fixed and defined proper terminator, then I would put my money on the file not being a fixed width file. The error message which you have provided points in that direction. It says that the record is of variable length. I would suggest you check the input file again. I have seen similar cases where the last couple of columns in the input file are of variable length. If possible can you post couple of lines from your source here to confirm it?

P.S. Did your method(question2) work for you. I doubt it.
Kris

Where's the "Any" key?-Homer Simpson
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Thanks Kris

Here are few lines from my raw data file ---- there are 10 fields in each record -- given below are 3 records...

Code: Select all

DRACDDrachenblut Delikatessen                Sven Ottlieb                  Order Administrator           Walserweg 21                                      Aachen                                       52066          Germany             0241-039123         0241-059428   
      
DUMONDu monde entier                         Janine Labrune                Owner                         67, rue des Cinquante Otages                      Nantes                                       44000          France              40.67.88.88         40.67.89.89    
     
EASTCEastern Connection                      Ann Devon                     Sales Agent                   35 King George                                    London                                       WX3 6FW        UK                  (171) 555-0297      (171) 555-3373     
Pradeep Kumar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The easiest way to get this right is as follows. Import the table definition for the sequential file. When on the Format tab, check the "fixed width" check box, then fill in the ten column widths as a comma separated list in the indicated field. Complete the import then load the table definition into a job that needs to read the file. All will be well (provided you got the widths correct).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

ray, in EE, in sequential file stage, there is no Fixed Width checkbox. My version is 7.5.1A.

pradkumar, a fixed width record can never have variable length columns. You have 1 or more columns of variable length data type in your record. For example, you need to change a varchar to char.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You CAN use fixed width in parallel Sequential File stage. I did one today.

Set the Record Length property to "Fixed" and do not have VarChar data types.

If you import the table definition of a fixed-width text file, then Load onto the Format tab in the parallel Sequential File stage, this happens automatically.

Note that "first line is column names = true" is not permitted when fixed width format is used.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply