How many columns does the file have and how long does it take to read the file when the only other stage in your job is a copy stage (or a peek stage)?
To see how long the actual file reader stage executes. If it is indeed 35 min then you know for sure that the sff-stage is what causes your performance problem.
ArndW: I've heard that in versions 7.5.3 and up it is indeed possible to read variable field length files with multiple nodes if your schema is correct. I haven't tried it myself and I am gladly corrected if you know the truth....
------------------------------------- http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
stefanfrost1 wrote:ArndW: I've heard that in versions 7.5.3 and up it is indeed possible to read variable field length files with multiple nodes if your schema is correct. I haven't tried it myself and I am gladly corrected if you know the truth....
Ray corrected me recently and said that our 'fixed-width only' statement was 'no longer true' but I didn't get a response to my follow-up query of 'no longer true since when?'. The answer could very well be since 7.5.3, however.
-craig
"You can never have too many knives" -- Logan Nine Fingers
I tested it after the post a couple of weeks(running 8) the restriction is still there. But perhaps I didn't get it quite right:
(A) One can define multiple readers on a single node with variable lenght records. I played around today and see that one can increase read speed by specifying multiple readers (assuming the other stages are fast as well).
(B) One cannot define multiple nodes on a variable length file, i.e. one is restricted to a single node with n-readers. If one tries to change that, the following error message is displayed at runtime:
Error executing View Data command:
##E IIS-DSEE-TOIX-00172 14:43:37(007) <Sequential_File_0> The multinode option requires fixed length records.
Last edited by ArndW on Mon Aug 17, 2009 7:12 am, edited 1 time in total.
Ah... perhaps that's the distinction being made. Multiple readers on a single node are allowed for variable length records but multiple nodes requires a fixed-width file.
-craig
"You can never have too many knives" -- Logan Nine Fingers
tested it after the post a couple of weeks(running the restriction is still there. But perhaps I didn't get it quite right:
(A) One can define multiple readers on a single node with variable lenght records. I played around today and see that one can increase read speed by specifying multiple readers (assuming the other stages are fast as well).
(B) One cannot define multiple nodes on a variable length file, i.e. one is restricted to a single node with n-readers. If one tries to change that, the following error message is displayed at runtime:
Error executing View Data command:
##E IIS-DSEE-TOIX-00172 14:43:37(007) <Sequential_File_0> The multinode option requires fixed length records.
I've been playing around with a variable length file ;-separated in 7.5.3 on AIX... I've found that I need to use Number of Readers Per Node and i set it to 10. According to monitor my partition is made on 10 nodes and I can preserve it throughout the flow. My (small) test showed a 6 times faster read using 10 nodes than using 1 node...
My file only had 22M rows at a total size of 3GB.
Furthermore! The size limitation that you , Rajee , is experiencing could be at your lookup if your not partitioning it properly since each node (at least in 7.5.x) has a OP limit of 2GB memory.....
------------------------------------- http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles