performance

sundar · Post by **sundar** » Mon May 15, 2006 5:31 am

lookup_fileset
|
dataset_1----- lookup-----transformer ---dynamicrdbms
|
transformer
|
dataset_2

Hi,
I have used the above job design in my project and used ENTIRE partitiong in lookupfileset and AUTO partition in the dataset_1,will it affect the performance.

Thanks
sundar

ashwin141 · Post by **ashwin141** » Mon May 15, 2006 6:55 am

Hi Sundar

Can you please be clearer about your job design and the reason why you chose two different partitioning methods, rather than the same one for the fileset and dataset.

If you will go for Auto, it will ensure that the records are key partitioned and sorted.

Let us know the details.

Regards
Ashwin

sundar · Post by **sundar** » Mon May 15, 2006 7:40 am

Hi Ashwin,

Thanks for u'r reply.

lookup_fileset
|
dataset_1----- lookup-----transformer ---dynamicrdbms
| (reject)
transformer
|
dataset_2

when i use lookupfileset for lookup, the partition tab shows
The currently selected link is a reference input from either a lookup fileset stage or a stage that is using a sparse.

for the dataset link it is AUTO

can u help me, how to select the partitions for better performance.

thanks
sundar

ashwin141 · Post by **ashwin141** » Mon May 15, 2006 7:54 am

Hi Sundar

Selecting partitions depends a lot on your requirements and job design.

I would suggest that you go through the User guide to understand in detail how partition works and which on you should use in a particular case.

I hope that helps you.

Regards
Ashwin

ashwin141 · Post by **ashwin141** » Mon May 15, 2006 8:48 am

Hi Sundar

To answer your specific question.
Whenever you are using a lookup just ensure that the data which you are looking up and the data in primary file should in same partition.

To ensure this your reference file should have either Entire partitioning or it should have the same partition method as source.

Ashwin

thompsonp · Post by **thompsonp** » Tue May 16, 2006 6:40 am

Using Entire for the reference dataset causes all of the data to be loaded into a single partition in memory. However all the stream partitions can see this data.

Therefore it does not matter (as far as the lookup is concerned) how your data is partitioned on the stream input as every partition will be able to access all the reference data. If there is a matching record it will be found.

The advantage of using Entire is that you don't have to perform costly repartitioning of the stream input on the lookup keys. If the stream data is already evenly spread across the partitions you can leave it be (use SAME to be certain of this).

ray.wurlod · Post by **ray.wurlod** » Tue May 16, 2006 2:39 pm

Using Entire for the reference dataset causes all of the data to be loaded into a single partition in memory. However all the stream partitions can see this data.

That's not quite correct. Entire causes the entire reference Data Set (or File Set) to be loaded onto each partition. Every partition. So it's more costly in memory, but does guarantee that every valid lookup will work.

kumar_s · Post by **kumar_s** » Tue May 16, 2006 11:07 pm

Hi,
If performance is much concerned. make the key with hash partition on both reference link as well as data link. (Even If it is a MMP system) If the partition not made clear and exact on the key it is always better to have entire partition to the reference link and make required key partiion on the data link.