Sparce lookup

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
deployDS
Premium Member
Premium Member
Posts: 45
Joined: Thu Mar 09, 2006 9:36 am

Sparce lookup

Post by deployDS »

Hi,

I need to do a look up for a dataset that could have a maximum of 50 rows with a DB2 table that could a minimum of 30 million rows.
For each row in dataset, there could be more than one matches from database. I am presently able to accomplish this using Join stage. But it is taking a very long time to pull data from database and do the lookup. I'm using a DB2 API stage as I do not have an enterprise stage.

A collegue suggested that I use a sparce look up through ODBC enterprise stage. But I was wondering if there's a way to get more than one matches from database when using lookup stage. Can anyone please suggest me?

Thanks!
koolnitz
Participant
Posts: 138
Joined: Wed Sep 07, 2005 5:39 am

Post by koolnitz »

1. You cannot use a Sparse lookup with DB2 API stage. See this post. I am not sure about ODBC.

2. A lookup can fetch multiple records from lookup table, if you check "Multiple rows returned from link" option in Constraints properties of Lookup stage.
Nitin Jain | India

If everything seems to be going well, you have obviously overlooked something.
deployDS
Premium Member
Premium Member
Posts: 45
Joined: Thu Mar 09, 2006 9:36 am

Post by deployDS »

Thanks Nithin! I am not trying to use API for sparce lookup. I know I cannot. I was using Join stage with API to fetch records.

Since we don't have enterprise db2, I'm trying to do sparce lookup through ODBC Enterprise stage. And when I use ODBC enterprise and set it for sparce lookup, the "Multiple rows returned from link" becomes disabled.

Any other suggestions?
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

changing join with lookup will not reduce time considerably.

I suggest that you break up the job into two. in one job just extract the table and perform join and other transformations in the second job.

Alternatively, extract from the table and write to a sequential file stage in the same job (introduce a sequential file between your API stage and the join stage.
deployDS
Premium Member
Premium Member
Posts: 45
Joined: Thu Mar 09, 2006 9:36 am

Post by deployDS »

Thank you Rohit, but isn't 30million-50million too much of data to be extracted to a file, while all I need to do is find matches for about 60 rows?

"changing join with lookup will not reduce time considerably."
I've never worked with sparce lookup before, but won't sparce lookup reduce the time considerably in my case?

Thanks!
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Load the 60 records into a temp table and utilize the power of UDB by doing a join at the database level. That would be faster anyday, keeping in mind the network traffic that will be induced if 30M records are to be loaded to a file and then doing a lookup.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
deployDS
Premium Member
Premium Member
Posts: 45
Joined: Thu Mar 09, 2006 9:36 am

Post by deployDS »

Thanks DSguru2B, that's a useful suggestion, but I cannot implement it because my target tables reside on a different database from staging tables. I only have access to create tables at staging level, while my lookup table is a target table.

Is there nothing I can do within sparce lookup in ODBC enterprise stage?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Why dont you build a lookup set from the 50 records. Have the DB2 table as your input and do a lookup. This way you will get all the records from the table that have a hit on the dataset. I am sure you must have thought of that but whats stopping you from using this method.
I am not very familiar with the ODBC enterprise stage so cannot help you in that.
I know in sql server there is a function which can be used to connect to a different database from one database. Dont know if any such thing exists for Db2. Do some research, i am sure it exists.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

deployDS wrote:Thanks DSguru2B, that's a useful suggestion, but I cannot implement it because my target tables reside on a different database from staging tables. I only have access to create tables at staging level, while my lookup table is a target table.
That really shouldn't be an issue. Can you not pull a list of the natural keys from your current staging tables and load them into a work table in the same db as the target tables? Then you can join them properly.
-craig

"You can never have too many knives" -- Logan Nine Fingers
deployDS
Premium Member
Premium Member
Posts: 45
Joined: Thu Mar 09, 2006 9:36 am

Post by deployDS »

My problem is solved. In fact, when using sparce lookup through ODBC enterprise stage, it outputs all the matches my default. I didn't know this and couldn't test till my driver was installed just now. And it is amazingly fast :)

Chulett, the problem is that I am not allowed to create any work tables in the db where target tables exist.

DSguru2B, yes, I did think of that, but was looking for ways to avoid calling all data from the table.

Thank you all for your responses n suggestions!
Post Reply