Transformer Tuning
Moderators: chulett, rschirm, roy
Transformer Tuning
Hi folks, I've got a 15 column transformation stage, about 400 bytes per row, and I am doing a NullToValue replacement on each column (nothing else). My throughput drops from 108K rows/sec to 58K rows/sec as a result of adding this stage.
This seems like a big hit. Are there tuning options, or is this pretty much an expected overhead?
Thanks,
Doug
This seems like a big hit. Are there tuning options, or is this pretty much an expected overhead?
Thanks,
Doug
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It is very hard to generalize performance impact. If your system were IO bound then adding complexity to transform or modify stages would make no difference in throughput. In this case the system was CPU bound and thus adding NullToValue() did make an impact. Perhaps reducing the number of processing nodes might bring the rows/second back up (I'm not recommending doing this, but in some cases it might impact throughput positively).
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
-
- Participant
- Posts: 22
- Joined: Thu Sep 11, 2008 11:47 pm
- Location: Bangalore
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hi folks, ramping up the number of partitions (two to sixteen) did indeed help considerably. I am now back up to 96K rows/sec.
Unfortunately though, running this many partitions affects lookup stages that happen to be using the "entire" partitioning strategy. That required some rework in this job. Since all of our transformation jobs will likely require transformer stages (!) this answer compromises the use of lookup stages, I think, at least if "entire" is being used in them.
In general, though does breaking a single transformer stage into a transformer+modify (as suggested above) provide performance gains? Is this something to try? In other words, should I reserve for the modify stage those transformations that it can do, and then leave only the more involved things for the transformation stage? I'm not sure why this would help as the transformer has to load all the rows anyway. Usually picking something up once is better than twice.
I can see coming up with some involved parallel transformer paths with each transformer only operating on rows that it needs to, followed by a funnel, with modify handling the bulk of the work.
Doug
Unfortunately though, running this many partitions affects lookup stages that happen to be using the "entire" partitioning strategy. That required some rework in this job. Since all of our transformation jobs will likely require transformer stages (!) this answer compromises the use of lookup stages, I think, at least if "entire" is being used in them.
In general, though does breaking a single transformer stage into a transformer+modify (as suggested above) provide performance gains? Is this something to try? In other words, should I reserve for the modify stage those transformations that it can do, and then leave only the more involved things for the transformation stage? I'm not sure why this would help as the transformer has to load all the rows anyway. Usually picking something up once is better than twice.
I can see coming up with some involved parallel transformer paths with each transformer only operating on rows that it needs to, followed by a funnel, with modify handling the bulk of the work.
Doug
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: