Hai DSGurus,
Iam new to datastage.
Can any one tell me what is the size of Sequential file and Fileset in PX, Which will be better regarding performance.
In what sittuation we need to use these to achieve better performance.
Thanks & Regards
Chandu
Size Limit
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Welcome aboard. :D
The size can be as large as the operating system will permit in the case of a Sequential File, and very very large (if needed) for a File Set. A File Set is spread over the processing nodes defined in the configuration file and consists of one or more logical files (each no larger than 2GB) on each node. So, theoretically, if you had 1000 processing nodes and 1000 logical files on each, that would be a file set of 2PB in size.
Sequential files can be created by any application. However, data can only be put into a File Set by a DataStage job or an Orchestrate script. On the other hand, you need to take great care setting up a Sequential File stage to use any form of non-sequential operation, whereas a File Set operates in parallel mode automatically.
Performance is largely a matter of expectation. If you define performance (in an ETL context) we may be happier to address that question.
The size can be as large as the operating system will permit in the case of a Sequential File, and very very large (if needed) for a File Set. A File Set is spread over the processing nodes defined in the configuration file and consists of one or more logical files (each no larger than 2GB) on each node. So, theoretically, if you had 1000 processing nodes and 1000 logical files on each, that would be a file set of 2PB in size.
Sequential files can be created by any application. However, data can only be put into a File Set by a DataStage job or an Orchestrate script. On the other hand, you need to take great care setting up a Sequential File stage to use any form of non-sequential operation, whereas a File Set operates in parallel mode automatically.
Performance is largely a matter of expectation. If you define performance (in an ETL context) we may be happier to address that question.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.