DataSet Advantages

senthil_tcs · Post by **senthil_tcs** » Mon Aug 03, 2009 4:02 am

Hi,

Can someone advise me the advantages of dataset over storing information in database other than the ones listed below and also the disadvantages of the same?

Pros

1) Paritition information will not be lost if the data is stored in dataset rather than database

2) Reduced I/O access

Cons

1) Can be read only using datastage utiltity (dataset management)

2) Sometimes dataset may get corrupted results in the loss of data

Thanks,
senthi

ArndW · Post by **ArndW** » Mon Aug 03, 2009 4:35 am

I have to disagree with Con #2. Both databases and dataset files are equally susceptible to data corruption.

DataSets are very fast when staying within the DataStage PX framework, much faster than database operations.
Database tables allow random access and updates/deletions. Datasets do not.

senthil_tcs · Post by **senthil_tcs** » Mon Aug 03, 2009 5:16 am

ArndW wrote:I have to disagree with Con #2. Both databases and dataset files are equally susceptible to data corruption.

DataSets are very fast when staying within the DataStage PX framework, much faster than database operations.
Database tables allow random access and updates/deletions. Datasets do not.

Thanks for your reply.

The reason why I mentioned cons 2 is, generally for databases they will do a backup (raid logic, mirroring...etc) so in case of data corruption they will be restore the information easily when compared to dataset data corruption where it's physically stored in the unix file directories where they hosted the datastage server. Let me know your thoughts on this.

ArndW · Post by **ArndW** » Mon Aug 03, 2009 5:19 am

DataSets can be backed up as well, and most development databases I know of are not backed up. DataSets are usually such that they can be re-created quickly as well.

ray.wurlod · Post by **ray.wurlod** » Mon Aug 03, 2009 4:55 pm

Data Sets preserve partitioning.

Data Sets preserve sorting.

Data are moved between persistent Data Sets and virtual Data Sets using the copy operator, which is hugely efficient compared to any other mechanism.

Those alone argue the case for using Data Sets for staging data between parallel jobs.

It you need staged data to be read by other applications, prefer File Sets, though you are now introducing import/export operators to convert the data from or to human-readable format.

Sreenivasulu · Post by **Sreenivasulu** » Wed Aug 05, 2009 12:59 am

The thing i like best about dataset is that accessing them is very fast and noone can change the file from outstide datastage

Regards
Sreeni

senthil_tcs · Post by **senthil_tcs** » Wed Aug 12, 2009 3:04 am

Thanks for all your help and inputs on this.