xli wrote:They have the same records number of 24 millions with the same key fields key1, key2, key3. each file has average 50 fields.
This is going to be brutal. 4 files X 50 columns merged together = ~200 columns. This row is so wide you are going to take serious performance penalties reading all of these characters.
xli wrote:
These files are required to be sorted by key1, key2 and key3 first, and being merged together, then output to apply further complex transformation functions in datastage job.
Why sort? Merging requires sorting but if you take the hash file route sorting is a waste of time because hash files don't care about the sort order when loading, it puts the data wherever it wants. If you're not using hash files, then you're using a database for reference purposes.
xli wrote:
I tried loading these files into Oracle tables and apply simply inner join by a indexed key. ordered by key1, key2 and key3, but the performance is too bad. It is also very slow even without "order by" applied.
Define "too bad". Did you load the tables, analyze them, and use a parallel query to do the work? Did you enforce a minimum degree of parallelism? How about partitioning the work tables to assist the query parser in dividing the work?
xli wrote:
My questions are :
1. Is there any Before-job routine to sort and merge these files, how about the performance?
None from Ascential, and if you use Unix commands (sort, sed, etc) you are single threading and not using all your resources.
xli wrote:
2. What is the best solution to handle such huge files sorting and merging.
A parallel database, such as Oracle 9i. Or DataStage PX. Since you have Oracle already....