data set Vs sequential file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ggs
Participant
Posts: 7
Joined: Mon Oct 01, 2007 7:14 am
Location: Hyderabad

data set Vs sequential file

Post by ggs »

What is the difference between the data set stage and sequential file stage?
up to where i know ,both of the stages can be loaded with the data formats like .CSV,.txt,and .dat.

so ,any one can help me on this....
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

DataSet stores the data in internal format. Retains meta data, sort order. And data is stored in Scratch space or resource (not sure which one) mentioned in the configuration file. What u see in the path is just descriptor file. The extension to DataSet does not matter. And the best part no need to handle nulls.

Sequential file is written in readable format. Meta data and sort order are not retained. All the data is stored in the path specified. You can change the delimiter and can get it in .CSV or .txt formats. And you need to specify the default value for handling nulls.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It is the resource disk that is used to store the data files associated with Data Sets and File Sets.

The assertion by ggs to have knowledge that "both of the stages can be loaded with the data formats like .CSV,.txt,and .dat" is horribly misplaced and, not to put too fine a point on it, so wrong. Not just because Data Sets use "internal" format - for example binary numbers - but also because ".txt" and ".dat" are not data formats.

Maveric's response is good as far as it goes. The biggest difference is implied - Data Set is the only data structure that can be used natively by osh operators - anything else must be converted to a (virtual) Data Set in order to participate in a parallel job. Therefore Data Set is the only really appropriate storage format to use when staging data between jobs. (Note, too, that the operator used to read or write a Data Set is the copy operator. That should tell you something!)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mcs_giri
Participant
Posts: 14
Joined: Sat Sep 22, 2007 8:44 am
Location: chennai

Post by mcs_giri »

Data set is managed by the Data stage itselt( Data stage Management Utility)..
It preserves partition..
No repartitioning is needed. These are some points i need yo add.. Thanks..
GIRIDHARANJ
Post Reply