Delimited file or fixed-width or CSV file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Shruthi
Participant
Posts: 74
Joined: Sun Oct 05, 2008 10:59 pm
Location: Bangalore

Delimited file or fixed-width or CSV file

Post by Shruthi »

Hi,

We have DB2 as source and want to put this in some temporary file and in the next step, read from the file. We will processing more 5 million records. Which is the better approach? To store it in CSV file or Flat file or delimited file?

Thank in advance,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Delimited file or fixed-width or CSV file

Post by chulett »

Shruthi wrote:Which is the better approach? To store it in CSV file or Flat file or delimited file?
Everything you've mentioned is a "flat file" and a "csv" file is one flavor of a delimited flat file, it just means a specific delimiter - a comma. So... six of one.

Your other option is a dataset and the pros/cons of that could depend on how much space you are willing to dedicate to this and what exactly this 'next step' is and needs.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vinothkumar
Participant
Posts: 342
Joined: Tue Nov 04, 2008 10:38 am
Location: Chennai, India

Post by vinothkumar »

If your next step is also in DataStage, Dataset will be a good option.
Shruthi
Participant
Posts: 74
Joined: Sun Oct 05, 2008 10:59 pm
Location: Bangalore

Post by Shruthi »

This file is needed by other teams for processing in other softwares. Hence dataset option is removed.
The next step is load into data warehouse. Before loading, we have many stages as per business need.
I am quite new to Datastage. I read about the properties "Number of readers per node" and "Read from multiple nodes". These are available only for fixed width files.
When Datastage PX is used, does it read parellely from delimited files? Is there no difference in used delimited and fixed-width files as far as performance is concerned?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As you noted, fixed-width files can be read by multiple reader nodes while delimited ones cannot. Nature of the beast rather than a specific DataStage 'thing'.

In your shoes, I would ask them (your other teams) what they would prefer. Again, it will depend on the tool used to do the actual loading but fixed-width files tend to be more 'performant' there as well.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In version 8.1 delimited files CAN be read using multiple readers per node, though it remains true that this is more efficient with fixed-width files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah... that's good to know about 8.1.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Shruthi
Participant
Posts: 74
Joined: Sun Oct 05, 2008 10:59 pm
Location: Bangalore

Post by Shruthi »

Thanks Ray! That was of great help.
Post Reply