Retrieve First Row

sairam_2003 · Post by **sairam_2003** » Thu Nov 13, 2003 4:34 am

Hi

I am new to DataStage... can someone help with my issue?

I am trying to find the address for all the clients and then
find the geometrical location based on zip code & zip code extension
from another table (TAB2).
Then insert the address and location into a new table.

The problem is when I lookup client zipcode with TAB2, multiple rows are returned. How can I just take the first row returned by the link?

I tried putting a constraint in the transformation, saying

(DSStageInRowNum = 1) and (link1.zipcde = link2.zipcode)

but it doesn't work.

Thanks for ur timely help.

Amos.Rosmarin · Post by **Amos.Rosmarin** » Thu Nov 13, 2003 6:35 am

Hi,

If there is no need for a specific zipcode the easyest way is to take TAB2 and unload it into hash file when the client_id is the key.
The hash file will filter the duplicate addresses. Then do the lookup using the hash as reference link.

Anyway it's always a good practice to use hash files for lookups, it's usually much faster then doing a lookup from relational DB.

Amos

Peytot · Post by **Peytot** » Thu Nov 13, 2003 7:12 am

You can also use a variable in your stage transform when you compare the new value with your last value. One for compare value and the other for keeping the value like old value.

But like Amos said HashFile can be more easy.

Pey

chulett · Post by **chulett** » Thu Nov 13, 2003 8:01 am

As said, hash files would be the Best Practice. You'd need to pull all of TAB2 into a hash keyed on (in this case) zip code with whatever data you need to retrieve during the lookup as data fields. Hash files do destructive overwrites, so last one in of each key value wins and is what you would end up looking up. If it mattered which one goes in last, do an 'order by' on your population query so that you get them in the order that you need.

To answer your original question, you don't mention database or stage you are using to look up against it with. I'm guessing ODBC because (from what I recall) it allows a lookup to return multiple rows, other stages don't. I believe there is an option on the stage to return multiple rows or not... have to check when I get to work. Otherwise, you could use 'Custom Sql' in the lookup stage. That would give two benefits - first, you could 'order by' your results so that (if needed) the right row was returned first and second, constrain your query to only return one row with something like "ROW_NUM < 2".

Hope that helps,

trobinson · Post by **trobinson** » Thu Nov 13, 2003 8:08 am

By way of preparing everyone for the Brave New World of 7.0, Hash Files are NOT always the faster method. It does not make sense to use a hash file in a real-time application where volume is not a factor. If volume is not a factor here then I would explore using a group function on the SQL of TAB2. This will return a single row. Put max(zipcode) in the derivation or build user defined SQL. Assuming the TAB2 stage will allow it. If volume is a factor then it makes complete sense to load the hash file upfront and then pound through the input rows.

chulett · Post by **chulett** » Thu Nov 13, 2003 8:16 am

Thanks, Todd - forgot to mention the use of a 'group by' as another option to solve this problem... and for pointing out that Real Time will require new Best Practices.

Speaking of Real Time... well, perhaps I should just start a new topic instead of hijacking this one.

ray.wurlod · Post by **ray.wurlod** » Thu Nov 13, 2003 2:46 pm

A couple of points to note on this one.

Because updates to hashed files are destructive overwrites, the suggested protocol will yield LAST rather than FIRST.

An Aggregator stage with a set function of First can give the first of each set. Or the extraction SQL can include an appropriate clause (depends on the actual vendor's SQL grammar) such as WHERE ROWNUM = 1 in the inner part of a correlated subquery.

Yet another alternative, and maybe a better one, would be to use Quality Stage to generate a "best of breed" consolidated client record. This approach fits neatly into the real-time requirement as well.

sairam_2003 · Post by **sairam_2003** » Fri Nov 14, 2003 8:30 am

Hey...

Thanks a lot guys...

I solved it using SQL override. Shall check for other ways too.