Look up and look up file set

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ketanshah123
Participant
Posts: 88
Joined: Wed Apr 05, 2006 1:04 am

Look up and look up file set

Post by ketanshah123 »

if we use a lookup stage the reference data gets looked up in the memory. Just a query here ....in case we use Lookup File sets as reference for a look up stage....would it still act as an overhead to the memory....

thanx in advance
devidotcom
Participant
Posts: 247
Joined: Thu Apr 27, 2006 6:38 am
Location: Hyderabad

Post by devidotcom »

Yes it will.
ketanshah123
Participant
Posts: 88
Joined: Wed Apr 05, 2006 1:04 am

Post by ketanshah123 »

devidotcom wrote:Yes it will.
thnx for reply but can you exlain it how....
devidotcom
Participant
Posts: 247
Joined: Thu Apr 27, 2006 6:38 am
Location: Hyderabad

Post by devidotcom »

From one of Ray's post..


--------------------------------------------------------------------------------





Warning - Technical Content
The reference input to a Lookup stage for a normal (not sparse) lookup causes a composite operator to be generated to perform two tasks, for which the operator names are LUT_CreateOp and LUT_ProcessOp.

LUT_ProcessOp loads the virtual data set associated with the reference link into memory and builds an index (a hash table) through which that data set can be accessed by key.

If, however, the reference link is fed by a Lookup File Set stage, the index has already been created when the Lookup File Set was populated, so it can be moved into memory rather than built at run time. This ought to be faster.

Parallelism of Lookup File Set is handled in the same way as all other stage types, by the partitioning (when written) and execution mode properties, and possibly by the preserve partitioning setting of the upstream stage. However, if it is too small, it will be created on only one node. Too small may be either less than 32KB or less than 128KB (or other, depending upon certain environment variables). Orchestrate does not move data in smaller units than 32KB.

LUT = lookup table


So lookup fileset is moved into memory!!!
:)
ketanshah123
Participant
Posts: 88
Joined: Wed Apr 05, 2006 1:04 am

Post by ketanshah123 »

devidotcom wrote:From one of Ray's post..
thnx you very much .....

--------------------------------------------------------------------------------





Warning - Technical Content
The reference input to a Lookup stage for a normal (not sparse) lookup causes a composite operator to be generated to perform two tasks, for which the operator names are LUT_CreateOp and LUT_ProcessOp.

LUT_ProcessOp loads the virtual data set associated with the reference link into memory and builds an index (a hash table) through which that data set can be accessed by key.

If, however, the reference link is fed by a Lookup File Set stage, the index has already been created when the Lookup File Set was populated, so it can be moved into memory rather than built at run time. This ought to be faster.

Parallelism of Lookup File Set is handled in the same way as all other stage types, by the partitioning (when written) and execution mode properties, and possibly by the preserve partitioning setting of the upstream stage. However, if it is too small, it will be created on only one node. Too small may be either less than 32KB or less than 128KB (or other, depending upon certain environment variables). Orchestrate does not move data in smaller units than 32KB.

LUT = lookup table


So lookup fileset is moved into memory!!!
:)
ketanshah123
Participant
Posts: 88
Joined: Wed Apr 05, 2006 1:04 am

Post by ketanshah123 »

devidotcom wrote:From one of Ray's post..
thnx you very much .....

--------------------------------------------------------------------------------





Warning - Technical Content
The reference input to a Lookup stage for a normal (not sparse) lookup causes a composite operator to be generated to perform two tasks, for which the operator names are LUT_CreateOp and LUT_ProcessOp.

LUT_ProcessOp loads the virtual data set associated with the reference link into memory and builds an index (a hash table) through which that data set can be accessed by key.

If, however, the reference link is fed by a Lookup File Set stage, the index has already been created when the Lookup File Set was populated, so it can be moved into memory rather than built at run time. This ought to be faster.

Parallelism of Lookup File Set is handled in the same way as all other stage types, by the partitioning (when written) and execution mode properties, and possibly by the preserve partitioning setting of the upstream stage. However, if it is too small, it will be created on only one node. Too small may be either less than 32KB or less than 128KB (or other, depending upon certain environment variables). Orchestrate does not move data in smaller units than 32KB.

LUT = lookup table


So lookup fileset is moved into memory!!!
:)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It is loaded into memory, but I take issue with the word "overhead".

As I originally posted, every non-sparse lookup reference link involves a virtual Data Set (and therefore being loaded into memory). So the use of a Lookup File Set as the source does not impose any additional overhead compared to other stage types. Indeed, since its index (hash table) has already been created, it is likely to be more efficient than most other stage types when servicing a reference input link to a Lookup stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

Ray, if a lookup fileset is replaced by a dataset, and hash partitioning was used to write to the dataset, wouldn't performance by the same during looking up in both cases, with the added advantage being that you can view data in a dataset whereas you cannot in a lookup fileset?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Marginal. With a Data Set the index (hash table) has to be built; with a Lookup File Set the index already exists and only needs to be moved into memory. For large reference sets the difference is negligible; for smaller reference sets it will be noticeable.

The ability to view data is purely cosmetic.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply