data set Vs sequential file

ggs · Post by **ggs** » Fri Oct 12, 2007 9:07 am

What is the difference between the data set stage and sequential file stage?
up to where i know ,both of the stages can be loaded with the data formats like .CSV,.txt,and .dat.

so ,any one can help me on this....

Maveric · Post by **Maveric** » Fri Oct 12, 2007 9:25 am

DataSet stores the data in internal format. Retains meta data, sort order. And data is stored in Scratch space or resource (not sure which one) mentioned in the configuration file. What u see in the path is just descriptor file. The extension to DataSet does not matter. And the best part no need to handle nulls.

Sequential file is written in readable format. Meta data and sort order are not retained. All the data is stored in the path specified. You can change the delimiter and can get it in .CSV or .txt formats. And you need to specify the default value for handling nulls.

ray.wurlod · Post by **ray.wurlod** » Fri Oct 12, 2007 2:09 pm

It is the resource disk that is used to store the data files associated with Data Sets and File Sets.

The assertion by ggs to have knowledge that "both of the stages can be loaded with the data formats like .CSV,.txt,and .dat" is horribly misplaced and, not to put too fine a point on it, so wrong. Not just because Data Sets use "internal" format - for example binary numbers - but also because ".txt" and ".dat" are not data formats.

Maveric's response is good as far as it goes. The biggest difference is implied - Data Set is the only data structure that can be used natively by osh operators - anything else must be converted to a (virtual) Data Set in order to participate in a parallel job. Therefore Data Set is the only really appropriate storage format to use when staging data between jobs. (Note, too, that the operator used to read or write a Data Set is the copy operator. That should tell you something!)

mcs_giri · Post by **mcs_giri** » Sun Oct 14, 2007 8:04 am

Data set is managed by the Data stage itselt( Data stage Management Utility)..
It preserves partition..
No repartitioning is needed. These are some points i need yo add.. Thanks..