Switch v/s filter stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 42
- Joined: Fri Oct 20, 2006 1:58 am
Switch v/s filter stage
Hi,
I will have to segregate the records from a dataset to 14 file sets depending upon the 14 different values in a column.
The number of records are 82460040.The records are not evenly distributed on all the output links.That means every output link does not carry same no. of records.
Can u plz advice me which stage is preferable in this case ,switch or filter
from the performance point of view?
This job is run every day once.
Thanks and Regards
Avik Dasgupta
I will have to segregate the records from a dataset to 14 file sets depending upon the 14 different values in a column.
The number of records are 82460040.The records are not evenly distributed on all the output links.That means every output link does not carry same no. of records.
Can u plz advice me which stage is preferable in this case ,switch or filter
from the performance point of view?
This job is run every day once.
Thanks and Regards
Avik Dasgupta
In this case the switch stage is exactly tailored to what you want to do and will be more efficient. If you have a large percentage of records that are dropped then the filter stage will be more appropriate, but if each row that comes in to the stage is passed on depending upon the values in one column then there is nothing better than the switch stage to do it.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You might also consider using a parallel Transformer stage if you are using version 7.5.1 or later. That way your constraint expressions may be easier to construct.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 437
- Joined: Fri Oct 15, 2004 6:13 am
- Location: Pune, India
The PX transformer stage will most likely use more CPU than the switch or filter stages. But chances are that the program will not be bottlenecked by CPU so it won't make that much of a difference.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 42
- Joined: Fri Oct 20, 2006 1:58 am
Hi,ArndW wrote:In this case the switch stage is exactly tailored to what you want to do and will be more efficient. If you have a large percentage of records that are dropped then the filter stage will be more appro ...
Thank you very much for your advice.
Can you plz explain me why switch stage is preferable?
Regards
Avik
Avik,
what you are doing is exactly what the switch stage was designed for - using one column's values to direct output. As noted earlier, you can use moth a filter and a transform to do the same thing. Barring bad design or implementation issues, it makes sense to use a purpose-written stage to effect something instead of using a more generic stage. The number of CPU-cycles should be lowest in the switch stage. If you have doubts about this it is easy enough to test out.
what you are doing is exactly what the switch stage was designed for - using one column's values to direct output. As noted earlier, you can use moth a filter and a transform to do the same thing. Barring bad design or implementation issues, it makes sense to use a purpose-written stage to effect something instead of using a more generic stage. The number of CPU-cycles should be lowest in the switch stage. If you have doubts about this it is easy enough to test out.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 42
- Joined: Fri Oct 20, 2006 1:58 am