Look up and look up file set
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 88
- Joined: Wed Apr 05, 2006 1:04 am
Look up and look up file set
if we use a lookup stage the reference data gets looked up in the memory. Just a query here ....in case we use Lookup File sets as reference for a look up stage....would it still act as an overhead to the memory....
thanx in advance
thanx in advance
-
- Participant
- Posts: 247
- Joined: Thu Apr 27, 2006 6:38 am
- Location: Hyderabad
-
- Participant
- Posts: 88
- Joined: Wed Apr 05, 2006 1:04 am
-
- Participant
- Posts: 247
- Joined: Thu Apr 27, 2006 6:38 am
- Location: Hyderabad
From one of Ray's post..
--------------------------------------------------------------------------------
Warning - Technical Content
The reference input to a Lookup stage for a normal (not sparse) lookup causes a composite operator to be generated to perform two tasks, for which the operator names are LUT_CreateOp and LUT_ProcessOp.
LUT_ProcessOp loads the virtual data set associated with the reference link into memory and builds an index (a hash table) through which that data set can be accessed by key.
If, however, the reference link is fed by a Lookup File Set stage, the index has already been created when the Lookup File Set was populated, so it can be moved into memory rather than built at run time. This ought to be faster.
Parallelism of Lookup File Set is handled in the same way as all other stage types, by the partitioning (when written) and execution mode properties, and possibly by the preserve partitioning setting of the upstream stage. However, if it is too small, it will be created on only one node. Too small may be either less than 32KB or less than 128KB (or other, depending upon certain environment variables). Orchestrate does not move data in smaller units than 32KB.
LUT = lookup table
So lookup fileset is moved into memory!!!
--------------------------------------------------------------------------------
Warning - Technical Content
The reference input to a Lookup stage for a normal (not sparse) lookup causes a composite operator to be generated to perform two tasks, for which the operator names are LUT_CreateOp and LUT_ProcessOp.
LUT_ProcessOp loads the virtual data set associated with the reference link into memory and builds an index (a hash table) through which that data set can be accessed by key.
If, however, the reference link is fed by a Lookup File Set stage, the index has already been created when the Lookup File Set was populated, so it can be moved into memory rather than built at run time. This ought to be faster.
Parallelism of Lookup File Set is handled in the same way as all other stage types, by the partitioning (when written) and execution mode properties, and possibly by the preserve partitioning setting of the upstream stage. However, if it is too small, it will be created on only one node. Too small may be either less than 32KB or less than 128KB (or other, depending upon certain environment variables). Orchestrate does not move data in smaller units than 32KB.
LUT = lookup table
So lookup fileset is moved into memory!!!
-
- Participant
- Posts: 88
- Joined: Wed Apr 05, 2006 1:04 am
devidotcom wrote:From one of Ray's post..
thnx you very much .....
--------------------------------------------------------------------------------
Warning - Technical Content
The reference input to a Lookup stage for a normal (not sparse) lookup causes a composite operator to be generated to perform two tasks, for which the operator names are LUT_CreateOp and LUT_ProcessOp.
LUT_ProcessOp loads the virtual data set associated with the reference link into memory and builds an index (a hash table) through which that data set can be accessed by key.
If, however, the reference link is fed by a Lookup File Set stage, the index has already been created when the Lookup File Set was populated, so it can be moved into memory rather than built at run time. This ought to be faster.
Parallelism of Lookup File Set is handled in the same way as all other stage types, by the partitioning (when written) and execution mode properties, and possibly by the preserve partitioning setting of the upstream stage. However, if it is too small, it will be created on only one node. Too small may be either less than 32KB or less than 128KB (or other, depending upon certain environment variables). Orchestrate does not move data in smaller units than 32KB.
LUT = lookup table
So lookup fileset is moved into memory!!!
-
- Participant
- Posts: 88
- Joined: Wed Apr 05, 2006 1:04 am
devidotcom wrote:From one of Ray's post..
thnx you very much .....
--------------------------------------------------------------------------------
Warning - Technical Content
The reference input to a Lookup stage for a normal (not sparse) lookup causes a composite operator to be generated to perform two tasks, for which the operator names are LUT_CreateOp and LUT_ProcessOp.
LUT_ProcessOp loads the virtual data set associated with the reference link into memory and builds an index (a hash table) through which that data set can be accessed by key.
If, however, the reference link is fed by a Lookup File Set stage, the index has already been created when the Lookup File Set was populated, so it can be moved into memory rather than built at run time. This ought to be faster.
Parallelism of Lookup File Set is handled in the same way as all other stage types, by the partitioning (when written) and execution mode properties, and possibly by the preserve partitioning setting of the upstream stage. However, if it is too small, it will be created on only one node. Too small may be either less than 32KB or less than 128KB (or other, depending upon certain environment variables). Orchestrate does not move data in smaller units than 32KB.
LUT = lookup table
So lookup fileset is moved into memory!!!
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It is loaded into memory, but I take issue with the word "overhead".
As I originally posted, every non-sparse lookup reference link involves a virtual Data Set (and therefore being loaded into memory). So the use of a Lookup File Set as the source does not impose any additional overhead compared to other stage types. Indeed, since its index (hash table) has already been created, it is likely to be more efficient than most other stage types when servicing a reference input link to a Lookup stage.
As I originally posted, every non-sparse lookup reference link involves a virtual Data Set (and therefore being loaded into memory). So the use of a Lookup File Set as the source does not impose any additional overhead compared to other stage types. Indeed, since its index (hash table) has already been created, it is likely to be more efficient than most other stage types when servicing a reference input link to a Lookup stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Marginal. With a Data Set the index (hash table) has to be built; with a Lookup File Set the index already exists and only needs to be moved into memory. For large reference sets the difference is negligible; for smaller reference sets it will be noticeable.
The ability to view data is purely cosmetic.
The ability to view data is purely cosmetic.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.