lookup_fileset
|
dataset_1----- lookup-----transformer ---dynamicrdbms
|
transformer
|
dataset_2
Hi,
I have used the above job design in my project and used ENTIRE partitiong in lookupfileset and AUTO partition in the dataset_1,will it affect the performance.
Thanks
sundar
performance
Moderators: chulett, rschirm, roy
performance
Hi Sundar
Can you please be clearer about your job design and the reason why you chose two different partitioning methods, rather than the same one for the fileset and dataset.
If you will go for Auto, it will ensure that the records are key partitioned and sorted.
Let us know the details.
Regards
Ashwin
Can you please be clearer about your job design and the reason why you chose two different partitioning methods, rather than the same one for the fileset and dataset.
If you will go for Auto, it will ensure that the records are key partitioned and sorted.
Let us know the details.
Regards
Ashwin
Hi Ashwin,
Thanks for u'r reply.
lookup_fileset
|
dataset_1----- lookup-----transformer ---dynamicrdbms
| (reject)
transformer
|
dataset_2
when i use lookupfileset for lookup, the partition tab shows
The currently selected link is a reference input from either a lookup fileset stage or a stage that is using a sparse.
for the dataset link it is AUTO
can u help me, how to select the partitions for better performance.
thanks
sundar
Thanks for u'r reply.
lookup_fileset
|
dataset_1----- lookup-----transformer ---dynamicrdbms
| (reject)
transformer
|
dataset_2
when i use lookupfileset for lookup, the partition tab shows
The currently selected link is a reference input from either a lookup fileset stage or a stage that is using a sparse.
for the dataset link it is AUTO
can u help me, how to select the partitions for better performance.
thanks
sundar
performance
Hi Sundar
Selecting partitions depends a lot on your requirements and job design.
I would suggest that you go through the User guide to understand in detail how partition works and which on you should use in a particular case.
I hope that helps you.
Regards
Ashwin
Selecting partitions depends a lot on your requirements and job design.
I would suggest that you go through the User guide to understand in detail how partition works and which on you should use in a particular case.
I hope that helps you.
Regards
Ashwin
performance
Hi Sundar
To answer your specific question.
Whenever you are using a lookup just ensure that the data which you are looking up and the data in primary file should in same partition.
To ensure this your reference file should have either Entire partitioning or it should have the same partition method as source.
Ashwin
To answer your specific question.
Whenever you are using a lookup just ensure that the data which you are looking up and the data in primary file should in same partition.
To ensure this your reference file should have either Entire partitioning or it should have the same partition method as source.
Ashwin
Using Entire for the reference dataset causes all of the data to be loaded into a single partition in memory. However all the stream partitions can see this data.
Therefore it does not matter (as far as the lookup is concerned) how your data is partitioned on the stream input as every partition will be able to access all the reference data. If there is a matching record it will be found.
The advantage of using Entire is that you don't have to perform costly repartitioning of the stream input on the lookup keys. If the stream data is already evenly spread across the partitions you can leave it be (use SAME to be certain of this).
Therefore it does not matter (as far as the lookup is concerned) how your data is partitioned on the stream input as every partition will be able to access all the reference data. If there is a matching record it will be found.
The advantage of using Entire is that you don't have to perform costly repartitioning of the stream input on the lookup keys. If the stream data is already evenly spread across the partitions you can leave it be (use SAME to be certain of this).
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Using Entire for the reference dataset causes all of the data to be loaded into a single partition in memory. However all the stream partitions can see this data.
That's not quite correct. Entire causes the entire reference Data Set (or File Set) to be loaded onto each partition. Every partition. So it's more costly in memory, but does guarantee that every valid lookup will work.
That's not quite correct. Entire causes the entire reference Data Set (or File Set) to be loaded onto each partition. Every partition. So it's more costly in memory, but does guarantee that every valid lookup will work.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi,
If performance is much concerned. make the key with hash partition on both reference link as well as data link. (Even If it is a MMP system) If the partition not made clear and exact on the key it is always better to have entire partition to the reference link and make required key partiion on the data link.
If performance is much concerned. make the key with hash partition on both reference link as well as data link. (Even If it is a MMP system) If the partition not made clear and exact on the key it is always better to have entire partition to the reference link and make required key partiion on the data link.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'