Who will use more memory Dataset or Seq. file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
lokesh_chopade
Participant
Posts: 16
Joined: Fri Oct 27, 2006 6:27 am

Who will use more memory Dataset or Seq. file

Post by lokesh_chopade »

If the data inserted into sequential file and same data stored in dataset, which will require more memory? Dataset Or Sequentail file.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If by memory you mean disk space then both are going to use similar amounts of storage. If you mean memory usage while processing the dataset will most likely use more (but run faster) since it will have at least one reader process per node
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are you using unbounded VarChar data types (where no maximum length is specified)? Are you using bounded VarChar data types (where a maximum length is specified)?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
lokesh_chopade
Participant
Posts: 16
Joined: Fri Oct 27, 2006 6:27 am

Post by lokesh_chopade »

what will be in both options? as such am using bounded data types.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Ray - thanks for catching that; I had forgotten that DataSets will pad out VarChar strings and thus can use significantly more disk storage. We had a case here recently where a VarChar(800) column was used to store 15 characters of data - but for millions of rows. Just changing the data type significantly reduced the size and therefore the speed of the DataSet.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

On the other hand, Data Sets store numbers in binary format, which can be much more compact than storing them as text.

How long is a piece of string?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

A piece of string is ALWAYS exactly 42 long. Of course, the units used for measuring are always different.
Ed Purcell
Premium Member
Premium Member
Posts: 23
Joined: Fri Mar 28, 2003 5:41 pm
Location: USA

Further...

Post by Ed Purcell »

ArndW wrote:Ray - thanks for catching that; I had forgotten that DataSets will pad out VarChar strings and thus can use significantly more disk storage. We had a case here recently where a VarChar(800) column was used to store 15 characters of data - but for millions of rows. Just changing the data type significantly reduced the size and therefore the speed of the DataSet.
So, do I have this correct? Unbounded VarChar strings are greatly discouraged by the manufacturer. Bounded VarChars too have their pitfalls. A dataset will allocate the maximum declared length for a VarChar. If you specify a max length that is too big, then you waste lots of space and slow things down. Right?
EPCCTX
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Right.
Post Reply