Page 1 of 1

Difference between File Set and Data Set

Posted: Wed Jul 07, 2004 11:18 pm
by bks_prasad
Hi All,

Can Any body explain me the scenario where i can go for File Set Stage rather than Data Set Stage? What is the exact difference between File Set and Data Set?Both are operating Systems files and are proprietary to the Data Stage

Thanks in Advance

Regards
Prasad

Posted: Thu Jul 08, 2004 12:34 am
by mandyli
I hope following main difference between File set and Dataset .

File set stage : only executes in parallel mode. You Can't handle file set independently.

Data Set Stage : configured to execute in parallel or sequential mode and You can also manage data sets independently of a job using the Data Set Management utility, that is available from the DataStage Designer, Manager, or Director.

Difference between File Set and Data Set

Posted: Thu Jul 08, 2004 1:14 am
by bks_prasad
Hi,

Thanks for your reply...But I need to know the exact scenerio where I can use File Set rather than Data Set.

Regards
Prasad

Posted: Mon Jul 19, 2004 3:05 am
by mandyli
Based on volume of data you can choose file or data set.

Posted: Mon Jul 19, 2004 3:02 pm
by nivas
mandyli wrote:Based on volume of data you can choose file or data set.
My assumption is For High volume we should go FileSet and the DataSet for the latter. Am I correct?

Posted: Tue Oct 02, 2012 4:16 am
by atul9806
Yes, In previous OS u can not make a dataset file greater then 2 GB.
But now, No OS have Such type of condition. [ It depends on Sys Admin also ;) )

FileSet stage can handle a lot of data if you want the data in readable format with preserving the partitioning.
Where DataSet Stage can also do the same but u are not able to see the data with datastage tool.

So, It depends on ur need if we skip the filesize condition.

Posted: Tue Oct 02, 2012 5:36 am
by ray.wurlod
You are all on the wrong track entirely. Both Data Sets and File Sets are parallel structures for storing data on disk retaining partitioning and sorted order. The difference between them is that Data Sets store data in the same internal format that DataStage parallel engine uses (the operator for reading and writing Data Sets is copy) whereas writing a File Set uses the export operator and reading from a File Set uses the import operator. Data stored in File Sets is in the same format as that used within text files, and therefore the data in a File Set can be read by humans and by other applications.

Both Data Set stage and File Set stage can operate in either parallel or sequential mode. However I can't think of a good reason to execute either in sequential mode.

Either can have up to 10,000 data ("segment") files per node. Even with a 2GB file size limit, that means that you can store up to 20,000 GB per node.