Hi All,
Could anybody let me know whether using the DataSet stage in place of sequential file stage (in PX jobs) will have any advantages in terms of performance or any other criteria? Also, please let me know if there are any drawback using the DataSet.
Thanks in advance,
Regards,
Pinkesh
Data Set vs Sequential Stage
Moderators: chulett, rschirm, roy
DataSet File : - It keeps the paralellism so if you create in a first Job a DataSet file, in your second job, you will run faster.
- You cannot read these files with server.
- Only PX can read this kind of file
- You can do a view data in the stage
Sequential : - You loose your parallelism. So you loose performance
- You cannot do a view data in the stage but you can access outside in your file (under Unix or windows if you wish). You can modify the data for the test (for example).
- You can archive them.
Regards,
Pey
- You cannot read these files with server.
- Only PX can read this kind of file
- You can do a view data in the stage
Sequential : - You loose your parallelism. So you loose performance
- You cannot do a view data in the stage but you can access outside in your file (under Unix or windows if you wish). You can modify the data for the test (for example).
- You can archive them.
Regards,
Pey
A dataset is an internal data staging file format for Parallel jobs. If Parallel jobs are going to have a common/repeated dataset for merge or lookup operations, landing it to a dataset is beneficial. It preserves the data so that many jobs could benefit by using the exact same data during their operations.
A sequential file has no referencing capability. It has to reside on a specific server file system.
The dataset format is proprietary and the only way to inspect it is via a DataStage. From a staging standpoint, it is not useful as a means of preparing a ready-to-load "file", because of the proprietary nature and tendency to be non-persistent. The sequential file is easy to audit the data, as it can be inspected by just about any text browser (more, cat, grep, vi, etc).
A sequential file is easy to manipulate and Ralph Kimball (oooohhmmm) recommends that you use it as the preferred method for milestone/recovery/restart staging formats because of the audit/transportability/ease-of-use of this format. Use datasets if you need reference capabilities and the data is not-persistent, meaning is temporary work files.
A sequential file has no referencing capability. It has to reside on a specific server file system.
The dataset format is proprietary and the only way to inspect it is via a DataStage. From a staging standpoint, it is not useful as a means of preparing a ready-to-load "file", because of the proprietary nature and tendency to be non-persistent. The sequential file is easy to audit the data, as it can be inspected by just about any text browser (more, cat, grep, vi, etc).
A sequential file is easy to manipulate and Ralph Kimball (oooohhmmm) recommends that you use it as the preferred method for milestone/recovery/restart staging formats because of the audit/transportability/ease-of-use of this format. Use datasets if you need reference capabilities and the data is not-persistent, meaning is temporary work files.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Data Set vs Sequential Stage
You can also read/write from/to a file set using PX. A file set is a set of partitioned sequential files, similar to a dataset, yet viewable w/o PX.
A PX trick with using datasets is to keep your data byte-aligned. I got this tip a while ago, and from what I understand, byte-alligned data is easier and faster for PX to process.
- BP
A PX trick with using datasets is to keep your data byte-aligned. I got this tip a while ago, and from what I understand, byte-alligned data is easier and faster for PX to process.
- BP
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Can you expand on this; in particular how can data NOT be byte-aligned? Do you mean word-aligned?
What are the implications for NLS, where the number of bytes used to store any particular character may be one, two three or even four?
What are the implications for NLS, where the number of bytes used to store any particular character may be one, two three or even four?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Data Set vs Sequential Stage
Ray,
You're right. Word-aligned, I believe. I don't know much more than what I already posted. I don't know about the NLS stuff.
- BP
You're right. Word-aligned, I believe. I don't know much more than what I already posted. I don't know about the NLS stuff.
- BP