DataSet Advantages

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
senthil_tcs
Premium Member
Premium Member
Posts: 40
Joined: Tue Oct 14, 2008 3:30 pm
Location: London

DataSet Advantages

Post by senthil_tcs »

Hi,

Can someone advise me the advantages of dataset over storing information in database other than the ones listed below and also the disadvantages of the same?

Pros

1) Paritition information will not be lost if the data is stored in dataset rather than database

2) Reduced I/O access

Cons

1) Can be read only using datastage utiltity (dataset management)

2) Sometimes dataset may get corrupted results in the loss of data

Thanks,
senthi
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I have to disagree with Con #2. Both databases and dataset files are equally susceptible to data corruption.

DataSets are very fast when staying within the DataStage PX framework, much faster than database operations.
Database tables allow random access and updates/deletions. Datasets do not.
senthil_tcs
Premium Member
Premium Member
Posts: 40
Joined: Tue Oct 14, 2008 3:30 pm
Location: London

Post by senthil_tcs »

ArndW wrote:I have to disagree with Con #2. Both databases and dataset files are equally susceptible to data corruption.

DataSets are very fast when staying within the DataStage PX framework, much faster than database operations.
Database tables allow random access and updates/deletions. Datasets do not.
Thanks for your reply.

The reason why I mentioned cons 2 is, generally for databases they will do a backup (raid logic, mirroring...etc) so in case of data corruption they will be restore the information easily when compared to dataset data corruption where it's physically stored in the unix file directories where they hosted the datastage server. Let me know your thoughts on this.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

DataSets can be backed up as well, and most development databases I know of are not backed up. DataSets are usually such that they can be re-created quickly as well.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Data Sets preserve partitioning.

Data Sets preserve sorting.

Data are moved between persistent Data Sets and virtual Data Sets using the copy operator, which is hugely efficient compared to any other mechanism.

Those alone argue the case for using Data Sets for staging data between parallel jobs.

It you need staged data to be read by other applications, prefer File Sets, though you are now introducing import/export operators to convert the data from or to human-readable format.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

The thing i like best about dataset is that accessing them is very fast and noone can change the file from outstide datastage

Regards
Sreeni
senthil_tcs
Premium Member
Premium Member
Posts: 40
Joined: Tue Oct 14, 2008 3:30 pm
Location: London

Post by senthil_tcs »

Thanks for all your help and inputs on this.
Post Reply