Lookup file sets ??????
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 210
- Joined: Wed Feb 16, 2005 7:17 am
Lookup file sets ??????
Hi,
Since I could not find answers to my questions, hence m here
Want to know more about lookup file sets... how do they atually work ????
do they use hashing? why is a lookup file set always created on the first node defined in the config file??? and if that is the case, then what kind of parallel operation does it do since it gets created on one node always???? any other relevant info..!!!
If these questions are answered somewhere then please point me to that doc / link...
Cheers
Aakash
Since I could not find answers to my questions, hence m here
Want to know more about lookup file sets... how do they atually work ????
do they use hashing? why is a lookup file set always created on the first node defined in the config file??? and if that is the case, then what kind of parallel operation does it do since it gets created on one node always???? any other relevant info..!!!
If these questions are answered somewhere then please point me to that doc / link...
Cheers
Aakash
L'arrêt essayant d'être parfait… évoluons.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Warning - Technical Content
The reference input to a Lookup stage for a normal (not sparse) lookup causes a composite operator to be generated to perform two tasks, for which the operator names are LUT_CreateOp and LUT_ProcessOp.
LUT_ProcessOp loads the virtual data set associated with the reference link into memory and builds an index (a hash table) through which that data set can be accessed by key.
If, however, the reference link is fed by a Lookup File Set stage, the index has already been created when the Lookup File Set was populated, so it can be moved into memory rather than built at run time. This ought to be faster.
Parallelism of Lookup File Set is handled in the same way as all other stage types, by the partitioning (when written) and execution mode properties, and possibly by the preserve partitioning setting of the upstream stage. However, if it is too small, it will be created on only one node. Too small may be either less than 32KB or less than 128KB (or other, depending upon certain environment variables). Orchestrate does not move data in smaller units than 32KB.
LUT = lookup table
The reference input to a Lookup stage for a normal (not sparse) lookup causes a composite operator to be generated to perform two tasks, for which the operator names are LUT_CreateOp and LUT_ProcessOp.
LUT_ProcessOp loads the virtual data set associated with the reference link into memory and builds an index (a hash table) through which that data set can be accessed by key.
If, however, the reference link is fed by a Lookup File Set stage, the index has already been created when the Lookup File Set was populated, so it can be moved into memory rather than built at run time. This ought to be faster.
Parallelism of Lookup File Set is handled in the same way as all other stage types, by the partitioning (when written) and execution mode properties, and possibly by the preserve partitioning setting of the upstream stage. However, if it is too small, it will be created on only one node. Too small may be either less than 32KB or less than 128KB (or other, depending upon certain environment variables). Orchestrate does not move data in smaller units than 32KB.
LUT = lookup table
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 210
- Joined: Wed Feb 16, 2005 7:17 am
Can you please explain what environment variables are those?Too small may be either less than 32KB or less than 128KB (or other, depending upon certain environment variables)
P.S:- The reason I have reopend this topic is that I just tried to write a lookup file set 52 MBs in size and it still got created just on the conductor node (the config file has 2 nodes)?
Job design : Row generator ----> Lkup File Set
Cheers
Aakash
L'arrêt essayant d'être parfait… évoluons.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
How do you know on which node(s) the Lookup File Set was created? The control file (xyz.fs) is possibly created on the conductor node, but how have you determined the location(s) of the data file(s)? The control file - sometimes called the descriptor file - is not the Lookup File Set itself.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Warning - Technical Content (again)
The descriptor file for a File Set or a Lookup File Set has a name ending in ".fs". Nevertheless the descriptor file itself is a text file, and can be examined with a text editor to determine the location(s) of the data file(s) comprising the File Set.
Premium members can read more about this here which is a prototype for something that will ultimately grace the DSXchange Learning Center (where the link in the document will work properly).
The descriptor file for a File Set or a Lookup File Set has a name ending in ".fs". Nevertheless the descriptor file itself is a text file, and can be examined with a text editor to determine the location(s) of the data file(s) comprising the File Set.
Premium members can read more about this here which is a prototype for something that will ultimately grace the DSXchange Learning Center (where the link in the document will work properly).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 210
- Joined: Wed Feb 16, 2005 7:17 am
By observing the lookup file set descriptor file, I come to know the nodes and segment file location: Here is my descriptor file:
As you can see, it is created on one node only while
1. my config file has 2 ndoes defined.
2. data is about 52 mb in size.
CHeers
Aakash
Code: Select all
--Orchestrate File Set v2
--LFile
node1:/vol/DataStage/tmp/Datasets/lookuptable.20080306.aj0zfqc
--Schema
record {LUTVersion="1"}
( KeyCol: int32 {dropped};
texta: string;
)
1. my config file has 2 ndoes defined.
2. data is about 52 mb in size.
CHeers
Aakash
L'arrêt essayant d'être parfait… évoluons.
-
- Premium Member
- Posts: 210
- Joined: Wed Feb 16, 2005 7:17 am
By observing the lookup file set descriptor file, I come to know the nodes and segment file location: Here is my descriptor file:
As you can see, it is created on one node only while
1. my config file has 2 ndoes defined.
2. data is about 52 mb in size.
CHeers
Aakash
Code: Select all
--Orchestrate File Set v2
--LFile
node1:/vol/DataStage/tmp/Datasets/lookuptable.20080306.aj0zfqc
--Schema
record {LUTVersion="1"}
( KeyCol: int32 {dropped};
texta: string;
)
1. my config file has 2 ndoes defined.
2. data is about 52 mb in size.
CHeers
Aakash
L'arrêt essayant d'être parfait… évoluons.
-
- Premium Member
- Posts: 210
- Joined: Wed Feb 16, 2005 7:17 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Please report the result of the following command:
Code: Select all
ls -l /vol/DataStage/tmp/Datasets/lookuptable.20080306.aj0zfqc
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 210
- Joined: Wed Feb 16, 2005 7:17 am
Here it is:
Code: Select all
-rwxrwx--- 1 myuser mygroup 54553192 Mar 06 05:47 /vol/DataStage/tmp/Datasets/lookuptable.20080306.aj0zfqc
L'arrêt essayant d'être parfait… évoluons.
-
- Participant
- Posts: 36
- Joined: Thu Sep 01, 2005 5:44 am
- Location: Canada
Hi,
What is the best partition type that should be given when a lookup file set is created with a key?
By default the partition type that comes when we drag and drop this stage to a job is "Entire". Will Entire partition cause the data to be written mutltiple number of times depending on the number of nodes and hence occupying a huge amount of space in the unix box?
What is the best partition type that should be given when a lookup file set is created with a key?
By default the partition type that comes when we drag and drop this stage to a job is "Entire". Will Entire partition cause the data to be written mutltiple number of times depending on the number of nodes and hence occupying a huge amount of space in the unix box?
Thanks & Regards,
Rony
Rony