Posted: Fri Sep 04, 2009 1:28 pm
Please read the chapter Configuration File for An SMP of Parallel Job Developer's Guid that will answere your doubt.
Code: Select all
DataSet
[ 5.5 mil records,
250 fields]
|
|
Sequential file --> Lookup --> Transfomer --> Filter ----> Funnel --> Dataset
[1 record, [9 links out
2 fields] from filter]
I would say that one input record counts as a 'small data volume'. And what kind of parallel processing do you think would be going on in a job that processes a single record? How many rows come from the lookup to the target? I'm wondering if the answer is 1 or 5.5 million.ray.wurlod wrote:2) Your result is an artifact of using a small data volume. For large data volumes, you will get a quicker completion time using two nodes versus using one.
5.5 million records get populated in target.How many rows come from the lookup to the target? I'm wondering if the answer is 1 or 5.5 million.
It's running in production. So the number of records may be between 5 and 6 million.Out of curiousity, is that just a testing volume and it will be a great deal larger in reality or is that all it will ever do?
Since you have 2 CPUs and resource utilization is high, increasing the number of nodes will not give you better results.when it runs, consumes 98 to 100% of both the CPUs.
Ray is correct on this point. But it should be extended to include resource availability.You get the same benefits as on MPP.
Code: Select all
DataSet
[ 5.5 mil records,
250 fields]
|
|
Sequential file --> Lookup --> Transfomer --> Filter ----> Funnel --> Dataset
[1 record, [9 links out
2 fields] from filter]