Why do we need fixed width columns ?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ady
Premium Member
Premium Member
Posts: 189
Joined: Thu Oct 12, 2006 12:08 am

Why do we need fixed width columns ?

Post by ady »

I have never used fixed width columns format in any of my outputs. I recently took over some jobs which have fixed width columnsas output and inputs.

I dont understand the concept of fixed width and was wondering why we need Fixed width columns? :?: Can anyone plz explain the use ?

Thnx
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

Some apps like Mainframe produce fixed widht columns as output. You dont have much choice.

Further, Qualitystage only works with fixed width columns.

These are the two places i have used fixed widht columns.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

It depends on the requirement. One reason I can think of is you might be passing on that file to some other division for their internal processing or something on those lines. You would need to find out from people around there to get a specific answer.
Kris

Where's the "Any" key?-Homer Simpson
ady
Premium Member
Premium Member
Posts: 189
Joined: Thu Oct 12, 2006 12:08 am

Post by ady »

I think this project is for creating a SAP BW
kris
Participant
Posts: 160
Joined: Tue Dec 09, 2003 2:45 pm
Location: virginia, usa

Re: Why do we need fixed width columns ?

Post by kris »

beaditya wrote:I dont understand the concept of fixed width and was wondering why we need Fixed width columns?
Here is my 10 Cents:

There is not much to understand about the concept of it.
Name itself says it is fixed in width. Makes easy interms of coding for the fixed width layout not only in datastage but in many applications.
The fixed field lengths makes consistently readable and easy to apply validations. Since the layout will have boundary length specifications, application doesn't need to worry about handling length of data larger than expected and easy to apply a validation check on whole file to rule out that the file is bad.

Need for them will depends on which application is going to use and how.
Example: Mainframe application.

Usually once the file an application expecting is fixed, there is very less likely that it will have less data issues like lengths and widths.

There is no need of writing fixed width files unless and until an application interface is expecting you to. Becasue they consume more space on the disc and there is going to be need for applying time consuming operations like trimming leading or trailing spaces while reading data and space filling while writing data.

As they consume extra space and time, its not advisible to write fixed width intermediate files in a process.

There could be more advantages that I couldn't think of, hope others will post their ideas.

Kris~
~Kris
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Substring is much faster than delimited field extraction. Therefore most bulk loaders recommend that you prefer fixed width format to delimited format.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

If DataStage is importing a fixed-width file you can also gain some performance benefits. With (and only with) fixed-width files you can specify multiple readers. That is, instead of one process reading the file you can have several processes.

If I have 8 CPUs, why have only 1 reading my million record fixed-width file? Even having 2 readers cuts the time in half.

The reason this is limited to fixed-width is that DataStage has to be able to calculate offsets for reading the file. If I have 100 records and 2 readers, then I want the first reader to deal with records 1 thru 50, and reader #2 to deal with records 51 thru 100. In a variable length file, DataStage would actually have to read the file twice to determine where each record is and then start reading the data. That isn't very efficient.

However, with a fixed-width file DataStage knows the overall size of the file and the number of bytes per record. Based on that, DataStage can calculate where each reader should start processing. So if there are 2 readers, then Reader #1 starts at byte 0 (obviously), and reader #2 starts at (overall size / 2 * record length + 1).

Bet that was more than you wanted. Oh well, suffice it to say that even a Unix guy can get some value out of a mainframe idea :)

Brad.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Have you tried multiple readers with delimited text files? Allegedly it became possible (version 7.5.1?).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

We have not tried that yet. I had heard rumors that it was coming, but I thought it was with Hawk.

I will have to give that a try. We don't have any delimited imports (considering 90% of our data comes from the mainframe/fixed-width world). But the delimited files we do have are relatively large, so a faster read is definitely worth it.

I'll let you know if we get it working... Do you know if there are any special rules associated with that functionality, or better yet, is there any updated documentation (hhhehehe) for it?

Brad.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Source: private conversation at Information on Demand 2006

Not aware of any documentation.

The mechanism to find the start points for each reader is to position to the, say, 25% point then scan forward until the next record terminator is found. That's the start point for this reader and the end point for the previous reader. Not sure if coordination is through the section leader process or by player processes communicating with each other.

It (obviously, given the above) does not work with data files that lack line terminators.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply