Posted: Sun Aug 24, 2003 8:28 pm
datastagedummy is really doing a cartesian product. This is a relational activity, so therefore it makes sense to do things with a relational database. What is happening is that each row is causing a SQL statement to be executed against the hash file, so if you have 100K rows, you get 100K SQL statements in a single-threaded job, therefore single-threaded statements.
Here's how I would approach this. I would take the source data file and bulk load w/truncate into a work table. Then, I'd do the cartesian product join and put the results back into a sequential text file. You now could have used N number of cpus based on a parallel query (if you're using something like Oracle or Informix.) You'll find that query happens multi-threaded and very fast.
It looks like you're doing something with product/inventory matrices, where you have to explode a top-level product into its parts. I showed a client this technique, where a matrix or portfolio item was exploded into its variant parts (Retek system). Blasting this file into a table, cartesian product, yank it back out into a file took less than 2 minutes for 300K rows. Then, the file was transformed as it was before, except that cartesian product lookup was removed. The job was then instantiated for a linear improvement.
Good luck!
Kenneth Bland
Here's how I would approach this. I would take the source data file and bulk load w/truncate into a work table. Then, I'd do the cartesian product join and put the results back into a sequential text file. You now could have used N number of cpus based on a parallel query (if you're using something like Oracle or Informix.) You'll find that query happens multi-threaded and very fast.
It looks like you're doing something with product/inventory matrices, where you have to explode a top-level product into its parts. I showed a client this technique, where a matrix or portfolio item was exploded into its variant parts (Retek system). Blasting this file into a table, cartesian product, yank it back out into a file took less than 2 minutes for 300K rows. Then, the file was transformed as it was before, except that cartesian product lookup was removed. The job was then instantiated for a linear improvement.
Good luck!
Kenneth Bland