Lookup direct from Oracle database

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Lookup direct from Oracle database

Post by nick.bond »

When I have a lookup stage with reference link direct to Oracle stage what exactly happens in the background?

Does this simply create a temporay lookup fileset which is then loaded into memory in "entire" mode on all nodes? or Does it create partitioned lookup filesets? Or does it not really create lookup filesets at all?

If a lookup fileset is only required for one job (MainJob) and will be recreated just before the each run of MainJob, will I get the same performance by having a separate job that loads the lookup fileset and then MainJob uses that lookup fileset compared to the easyier implementation of having the lookup stage connected directly to the Oracle stage?
Regards,

Nick.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

If you reference oracle stage directly, it doesnt create any fileset, (hope you are not talking about virtual dataset). Search for Sparse lookup data will get looked up directly from the database.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Without a sparse lookup the entire table is loaded into a virtual Data Set against which lookups are performed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

i guess i was talking about the virtual dataset.

Forgetting the overhead in creating this virtual dataset or the overhead creating the lookup fileset, once you had them is there a large performance differnece between performing lookups on the virtual dataset rather than the lookup fileset?

1) With small amounts of lookup data < 100,000 rows
2) With large volumes in the lookup data > 10,000,000

Is the physical implementation of the oracle stage/virtual dataset and the lookup stage very different from the lookup fileset and the lookup stage?
Regards,

Nick.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

Is the physical implementation of the oracle stage/virtual dataset and the lookup stage very different from the lookup fileset and the lookup stage?
For the above I mean in terms of what happens when the job runs not how one implements this in the job.
Regards,

Nick.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's more to do with the number of distinct values that are likely to be looked up. If this is "small" (they have never quantified what this means), then a sparse lookup is likely to be beneficial.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Also depends on the database conectivity, it is costly to fetch value for each record, it is considrable to have no sparse lookup. Database available in tele network will be an example.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply