performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sundar
Participant
Posts: 30
Joined: Thu Sep 01, 2005 10:34 am

performance

Post by sundar »

lookup_fileset
|
dataset_1----- lookup-----transformer ---dynamicrdbms
|
transformer
|
dataset_2

Hi,
I have used the above job design in my project and used ENTIRE partitiong in lookupfileset and AUTO partition in the dataset_1,will it affect the performance.


Thanks
sundar
ashwin141
Participant
Posts: 95
Joined: Wed Aug 24, 2005 2:26 am
Location: London, UK

performance

Post by ashwin141 »

Hi Sundar

Can you please be clearer about your job design and the reason why you chose two different partitioning methods, rather than the same one for the fileset and dataset.

If you will go for Auto, it will ensure that the records are key partitioned and sorted.

Let us know the details.

Regards
Ashwin
sundar
Participant
Posts: 30
Joined: Thu Sep 01, 2005 10:34 am

Post by sundar »

Hi Ashwin,

Thanks for u'r reply.

lookup_fileset
|
dataset_1----- lookup-----transformer ---dynamicrdbms
| (reject)
transformer
|
dataset_2

when i use lookupfileset for lookup, the partition tab shows
The currently selected link is a reference input from either a lookup fileset stage or a stage that is using a sparse.

for the dataset link it is AUTO

can u help me, how to select the partitions for better performance.

thanks
sundar
ashwin141
Participant
Posts: 95
Joined: Wed Aug 24, 2005 2:26 am
Location: London, UK

performance

Post by ashwin141 »

Hi Sundar

Selecting partitions depends a lot on your requirements and job design.

I would suggest that you go through the User guide to understand in detail how partition works and which on you should use in a particular case.

I hope that helps you.

Regards
Ashwin
ashwin141
Participant
Posts: 95
Joined: Wed Aug 24, 2005 2:26 am
Location: London, UK

performance

Post by ashwin141 »

Hi Sundar

To answer your specific question.
Whenever you are using a lookup just ensure that the data which you are looking up and the data in primary file should in same partition.

To ensure this your reference file should have either Entire partitioning or it should have the same partition method as source.


Ashwin
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

Using Entire for the reference dataset causes all of the data to be loaded into a single partition in memory. However all the stream partitions can see this data.

Therefore it does not matter (as far as the lookup is concerned) how your data is partitioned on the stream input as every partition will be able to access all the reference data. If there is a matching record it will be found.

The advantage of using Entire is that you don't have to perform costly repartitioning of the stream input on the lookup keys. If the stream data is already evenly spread across the partitions you can leave it be (use SAME to be certain of this).
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Using Entire for the reference dataset causes all of the data to be loaded into a single partition in memory. However all the stream partitions can see this data.

That's not quite correct. Entire causes the entire reference Data Set (or File Set) to be loaded onto each partition. Every partition. So it's more costly in memory, but does guarantee that every valid lookup will work.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,
If performance is much concerned. make the key with hash partition on both reference link as well as data link. (Even If it is a MMP system) If the partition not made clear and exact on the key it is always better to have entire partition to the reference link and make required key partiion on the data link.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply