Performance of Merge Stage vs Hashed Lkup to combine files
Posted: Fri Jan 19, 2007 11:40 am
I need to merge 3 fixed width sequential files with about 15 currency fields each and have approximately 1.5 million records which may or may not be sorted on the key values. The number of records may slowly grow over time. I need to untimately combine the 3 files to be loaded into a staging table. Before choosing the Merge Stage over the Hashed Lookup stage or vice versa, I wanted to find out if there are any performace issues or pros/cons of either stage that should be kept in mind.
1. Is there a known limit to the size or record counts of files that need to be merged?
2. When two files are being merged, not that there is an issue of temporary space, but is it enough if the temporary directory has atleast twice the size of both files put together available? i.e if the files are 1GB each would it be enough if the temporary directory has atleast 4GB available?
3. If we manually created 64-bit hashed files to avoid the 2GB limit, would it perform better than the merge stage?
4. If the Merge stage were picked, would any time spent in pre-sorting the 3 files speed up the merge process significantly?
The above example may not correclty depict the actual dataset but all three files will have atleast 90-95% of the Key values in common and I need the extra rows (outer join) from each file as well.
Thanks in Advance.
1. Is there a known limit to the size or record counts of files that need to be merged?
2. When two files are being merged, not that there is an issue of temporary space, but is it enough if the temporary directory has atleast twice the size of both files put together available? i.e if the files are 1GB each would it be enough if the temporary directory has atleast 4GB available?
3. If we manually created 64-bit hashed files to avoid the 2GB limit, would it perform better than the merge stage?
4. If the Merge stage were picked, would any time spent in pre-sorting the 3 files speed up the merge process significantly?
Code: Select all
Example:
File A
KEY A1 A2 ... A15
V 0 1 ... 2
X 1 3 ... 5
Y 4 5 ... 7
.
.
File B
KEY B1 B2 .... B15
Y 1 7 ... 14
Z 3 4 ... 2
V 1 5 ... 2
File C
KEY C1 C2 .... C15
X 3 5 ... 6
W 3 8 ... 6
V 3 6 ... 7
Result Required:
KEY A1 ... A15 B1 ... B15 C1 ... C15
X 1 ... 5 3 ... 6
Y 4 ... 7 1 ... 14
W 3 ... 6
Z 3 ... 2
V 1 ... 2 1 ... 2 3 ... 7
Thanks in Advance.