Difference between Data set and File set

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bkumar103
Participant
Posts: 214
Joined: Wed Jul 25, 2007 2:29 am
Location: Chennai

Difference between Data set and File set

Post by bkumar103 »

Hi,

What is the main difference between data set and file set stage?
What are the things should be considered before deciding abt the data set and file set stage?


Thanks in advance,
Birendra
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Re: Difference between Data set and File set

Post by priyadarshikunal »

bkumar103 wrote:Hi,

What is the main difference between data set and file set stage?
What are the things should be considered before deciding abt the data set and file set stage?


Thanks in advance,
Birendra
Birendra,
Dataset is an internal format of DataStage the main points to be considered about dataset before using are:
1) It stores data in binary in the internal format of DataStage so, it takes less time to read/write from dataset than any other source/target.
2)It preserves the partioning schemes so that you don't have to partition it again.
3)You cannot view data without datastage

Now, About Fileset
1)It stores data in the format similar to a sequential file.
2)Only advantage of using fileset over a sequential file is "it preserves partioning scheme"
3)You can view the data but in the order defined in partitioning scheme

Now, since you got these points, it totally depends on your requirements which filetype should be used.

regards,
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
bkumar103
Participant
Posts: 214
Joined: Wed Jul 25, 2007 2:29 am
Location: Chennai

Post by bkumar103 »

What is the volume of data a "dataset" or "fileset" can hold?
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

Heard there is a limit of 2 GB per file in a UNIX box. so for i node config for dataset its 2 GB for 2 node it is 4 GB and so on(No of nodes * 2 approx). Not sure how a file set works, but the 2 GB limit holds good for each file created.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Very few UNIX machines still have that limit these days, unless the Sys Admin chooses not to enable large file support.
A Data Set or File Set can employ multiple physical files per processing node, so even a 2GB file size limit is not an obstacle. The largest (theoretical) File Set can have 10,000 files per partition, and not more than 1,000,000 partitions. At 2GB per file you do the math! And that's a lower bound!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Dev_India
Premium Member
Premium Member
Posts: 9
Joined: Sun May 13, 2007 11:07 am

Post by Dev_India »

Mr Kumar,

Is this the interview question???
Post Reply