Hi,
Can someone advise me the advantages of dataset over storing information in database other than the ones listed below and also the disadvantages of the same?
Pros
1) Paritition information will not be lost if the data is stored in dataset rather than database
2) Reduced I/O access
Cons
1) Can be read only using datastage utiltity (dataset management)
2) Sometimes dataset may get corrupted results in the loss of data
Thanks,
senthi
DataSet Advantages
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 40
- Joined: Tue Oct 14, 2008 3:30 pm
- Location: London
-
- Premium Member
- Posts: 40
- Joined: Tue Oct 14, 2008 3:30 pm
- Location: London
Thanks for your reply.ArndW wrote:I have to disagree with Con #2. Both databases and dataset files are equally susceptible to data corruption.
DataSets are very fast when staying within the DataStage PX framework, much faster than database operations.
Database tables allow random access and updates/deletions. Datasets do not.
The reason why I mentioned cons 2 is, generally for databases they will do a backup (raid logic, mirroring...etc) so in case of data corruption they will be restore the information easily when compared to dataset data corruption where it's physically stored in the unix file directories where they hosted the datastage server. Let me know your thoughts on this.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Data Sets preserve partitioning.
Data Sets preserve sorting.
Data are moved between persistent Data Sets and virtual Data Sets using the copy operator, which is hugely efficient compared to any other mechanism.
Those alone argue the case for using Data Sets for staging data between parallel jobs.
It you need staged data to be read by other applications, prefer File Sets, though you are now introducing import/export operators to convert the data from or to human-readable format.
Data Sets preserve sorting.
Data are moved between persistent Data Sets and virtual Data Sets using the copy operator, which is hugely efficient compared to any other mechanism.
Those alone argue the case for using Data Sets for staging data between parallel jobs.
It you need staged data to be read by other applications, prefer File Sets, though you are now introducing import/export operators to convert the data from or to human-readable format.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 892
- Joined: Thu Oct 16, 2003 5:18 am
-
- Premium Member
- Posts: 40
- Joined: Tue Oct 14, 2008 3:30 pm
- Location: London