Page 1 of 1

Performance: Using interprocess, intraprocess, row buffers

Posted: Thu Feb 14, 2008 4:18 pm
by RodBarnes
Though I've been using DS for about five years, I've had no previous need to change the default performance settings for any of our ETL. Recently, though, I've begun exploring what is available and have read lots of threads about these three topics on DSXchange. :)

But a question remains unanswered: Based upon reading the manuals, it seems I would want to use the "Enable row buffer" option by default because it may generally improve performance. However, it is disabled by default so that implies there is some kind of trade-off or that it should only be used under specific circumstances -- something like that.

Are there reasons one would not use the "Enable row buffer" option? One consideration (I think) is when you have rows of sufficient length to exceed the row buffer? Are there cases where using the option would be a hindrance?

I plan on testing these options best I can but thought someone here might be able to weigh in with some experience.

Posted: Thu Feb 14, 2008 9:53 pm
by kcbland
One consideration is when you have a series of Transformers and an early Transformer is referencing a hashed file that is updated in a subsequent Transformer. If you want the rows written to be available for reference, you can't use any read or write caching and no buffering or inter-processing.

You also get into COMMON memory area usage issues, as rows updating COMMON variables in one Transformer are working out of step with other Transformers. If row order dependency is a requirement, then you get into tricky situations.

Also consider that these buffers and inter-process only assist in overly complex jobs with lots of lookups, especially database lookups. The buffering masks a lot of the delays in processing single rows at a time. They can be a quick band-aid to very large pre-existing jobs as opposed to breaking down these DeathStar(TM) sized jobs.