Making ETL refer a rule based engine

datastage · Post by **datastage** » Tue Jun 01, 2004 2:53 pm

kcbland wrote:
1) The trade-off is flexibility versus performance, and right now performance is a bigger driving sales factor than flexibility.

2) Personally, I LOVE the Server product and believe that Parallel technology should have blended into it for the E and L and some T.

3) Right now, my ETL solutions are mixed bag combinations of Server and Parallel jobs.

4) Introducing a Rules Based Transformation stage would pretty much chuck the whole way I'd use DS.

1) really? I haven't stayed in touch with what is driving sales. I would have guesses the flexibility was a stronger factor over performance. Have people finally stopped asking the stupid question of 'how many rows per second can it process?' and started focusing on the methodogies inherent to the product to improve performance?

2) Anyone think these will be blended in the future? My guess is no.

3) Ken, how does the mixture of Server and Parallel jobs fit into your theory of best practices? Is this something you think works fine or is it something you don't like but feel you have to do in order to deliver the best results?

4) we like challenges right?

Byron.

kcbland · Post by **kcbland** » Tue Jun 01, 2004 7:27 pm

Read everything from Ascential for the last two years - it's all about a scalable framework to handle huge volumes. The Torrent acquisition was for the parallel technology product Orchestrate, on which Ascential has stated is the future path for the products to take. Everything is riding on that framework except Server jobs.

Server jobs are built using the Universe code branch. The fantastic flexibility and power and stability forms a great base product. However, in the era of grid computing and distributed models harnessing multiple disparate servers into a single large logical server it doesn't cut it. The PX technology, if you have a lot of servers lying around, allows you to harness them as a single logical server. For super-high volumes, this can be ideal.

However, in dump-n-pump data warehousing, with little or no finesse transformation, you don't need a nimble product. You need a product that can do high-performance extractions and high-performance loading. This is PX and Ab Initio, because they have intelligence built into them to understand parallel query and parallel dml and query partitioning and table partitioning of the source and target RDBMS's. They stumble on the nimble transformation because they're built for speed.

I'm a HUGE proponent of using Kimball architecture. I also like to focus on DATA WAREHOUSE SOLUTIONS. So my commentary is for such. I firmly believe that all source data staged, should be fully transformed and re-staged in ready-to-load form (sequential load files). This conflicts with need-for-speed paradigms. However, in a world of audit, control, restart/resurrect, maintenance, etc, you have to concede in your architecture.

My mixed bag design ideally uses PX for E and L, with a mixture of Server and PX during T. SCD's and complex conditional rollups and just some of the oddball transformations you have to do IMO are best handled by Server jobs. Then, take some of that staged hash file data that's been massaged and throw it into PX datasets for high-performance fact processing using PX merge operators. That's kind of the mixed bag approach I'm talking about.