Database Lookup issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vemisr
Participant
Posts: 72
Joined: Thu Sep 11, 2008 1:31 pm

Database Lookup issue

Post by vemisr »

Hi Experts:

Env : DS8.1, Unix, Oracle in PX

My Lookup database having 300 Million Records,

incoming file having 100 K to 500 K records, it needs to perform database lookup , if match found i will get the ACCT Num from the Database.

Req:
incoming file having name, street name, unit, city, state and zip fields if match found then i will get the acct num from the database.

the question is how can get the best performance. performance is very critical, i need to process multiple files in multiple times a day.

can you suggest me the solutions.

thanks
Vemisr
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Join stage, sorted inputs, Partition on name and perhaps also on zip, state and city.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

If the database is same, then do the join within the source.

Else do a sparse lookup.
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Is there potential to filter out some of those 300m records? i.e Expired etc

What are you doing the join on? All those fields mentioned? If so, then maybe maintaining a separate checksum table could help...
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

Sainath.Srinivasan wrote:If the database is same, then do the join within the source.

Else do a sparse lookup. ...
Hi Sainath ,

I think 500K input records may be high for a sparse lookup ...

Is there any optimal value on number of records for a sparse lookup ??
Nag
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

As always, "it depends".

For example, for very narrow rows, 500K rows is not such a large volume of data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abhijain
Participant
Posts: 88
Joined: Wed Jun 13, 2007 1:10 pm
Location: India

Post by abhijain »

As suggested -
1. Go for sql query join if both are from same database.
2. Can go for Join stage if not bothering about the unmatched records.
3. Also, if wants to perform the look up only then try to use minimum columns in look up stage and by pass the rest.

Also, Please elobrate more on your problem.
Rgrds,
Abhi
makreddy
Participant
Posts: 21
Joined: Wed Sep 14, 2005 10:40 pm
Location: hyderabad
Contact:

Re: Database Lookup issue

Post by makreddy »

One simple Solutopn.
Insert the source records into a database if they come from other source, and do a database join.

vemisr wrote:Hi Experts:

Env : DS8.1, Unix, Oracle in PX

My Lookup database having 300 Million Records,

incoming file having 100 K to 500 K records, it needs to perform database lookup , if match found i will get the ACCT Num from the Database.

Req:
incoming file having name, street name, unit, city, state and zip fields if match found then i will get the acct num from the database.

the question is how can get the best performance. performance is very critical, i need to process multiple files in multiple times a day.

can you suggest me the solutions.

thanks
Vemisr
Aravind
Post Reply