Lookup file sets ??????

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Lookup file sets ??????

Post by aakashahuja »

Hi,

Since I could not find answers to my questions, hence m here :-)

Want to know more about lookup file sets... how do they atually work ????
do they use hashing? why is a lookup file set always created on the first node defined in the config file??? :roll: and if that is the case, then what kind of parallel operation does it do since it gets created on one node always???? any other relevant info..!!!

If these questions are answered somewhere then please point me to that doc / link...

Cheers
Aakash
L'arrêt essayant d'être parfait… évoluons.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Warning - Technical Content
The reference input to a Lookup stage for a normal (not sparse) lookup causes a composite operator to be generated to perform two tasks, for which the operator names are LUT_CreateOp and LUT_ProcessOp.

LUT_ProcessOp loads the virtual data set associated with the reference link into memory and builds an index (a hash table) through which that data set can be accessed by key.

If, however, the reference link is fed by a Lookup File Set stage, the index has already been created when the Lookup File Set was populated, so it can be moved into memory rather than built at run time. This ought to be faster.

Parallelism of Lookup File Set is handled in the same way as all other stage types, by the partitioning (when written) and execution mode properties, and possibly by the preserve partitioning setting of the upstream stage. However, if it is too small, it will be created on only one node. Too small may be either less than 32KB or less than 128KB (or other, depending upon certain environment variables). Orchestrate does not move data in smaller units than 32KB.

LUT = lookup table
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

Too small may be either less than 32KB or less than 128KB (or other, depending upon certain environment variables)
Can you please explain what environment variables are those?

P.S:- The reason I have reopend this topic is that I just tried to write a lookup file set 52 MBs in size and it still got created just on the conductor node (the config file has 2 nodes)?

Job design : Row generator ----> Lkup File Set

Cheers
Aakash
L'arrêt essayant d'être parfait… évoluons.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How do you know on which node(s) the Lookup File Set was created? The control file (xyz.fs) is possibly created on the conductor node, but how have you determined the location(s) of the data file(s)? The control file - sometimes called the descriptor file - is not the Lookup File Set itself.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Warning - Technical Content (again)

The descriptor file for a File Set or a Lookup File Set has a name ending in ".fs". Nevertheless the descriptor file itself is a text file, and can be examined with a text editor to determine the location(s) of the data file(s) comprising the File Set.

Premium members can read more about this here which is a prototype for something that will ultimately grace the DSXchange Learning Center (where the link in the document will work properly).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

By observing the lookup file set descriptor file, I come to know the nodes and segment file location: Here is my descriptor file:

Code: Select all

--Orchestrate File Set v2
--LFile
node1:/vol/DataStage/tmp/Datasets/lookuptable.20080306.aj0zfqc
--Schema
record {LUTVersion="1"}
( KeyCol: int32 {dropped};
  texta: string;
)

As you can see, it is created on one node only while
1. my config file has 2 ndoes defined.
2. data is about 52 mb in size.

CHeers
Aakash
L'arrêt essayant d'être parfait… évoluons.
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

By observing the lookup file set descriptor file, I come to know the nodes and segment file location: Here is my descriptor file:

Code: Select all

--Orchestrate File Set v2
--LFile
node1:/vol/DataStage/tmp/Datasets/lookuptable.20080306.aj0zfqc
--Schema
record {LUTVersion="1"}
( KeyCol: int32 {dropped};
  texta: string;
)

As you can see, it is created on one node only while
1. my config file has 2 ndoes defined.
2. data is about 52 mb in size.

CHeers
Aakash
L'arrêt essayant d'être parfait… évoluons.
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

??
L'arrêt essayant d'être parfait… évoluons.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please report the result of the following command:

Code: Select all

ls -l /vol/DataStage/tmp/Datasets/lookuptable.20080306.aj0zfqc 
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

Here it is:

Code: Select all

-rwxrwx---   1 myuser mygroup   54553192 Mar 06 05:47 /vol/DataStage/tmp/Datasets/lookuptable.20080306.aj0zfqc
L'arrêt essayant d'être parfait… évoluons.
rony_daniel
Participant
Posts: 36
Joined: Thu Sep 01, 2005 5:44 am
Location: Canada

Post by rony_daniel »

Hi,

What is the best partition type that should be given when a lookup file set is created with a key?

By default the partition type that comes when we drag and drop this stage to a job is "Entire". Will Entire partition cause the data to be written mutltiple number of times depending on the number of nodes and hence occupying a huge amount of space in the unix box?
Thanks & Regards,
Rony
Post Reply