Retrieve First Row

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sairam_2003
Participant
Posts: 2
Joined: Thu Nov 13, 2003 3:38 am
Location: NY

Retrieve First Row

Post by sairam_2003 »

Hi

I am new to DataStage... can someone help with my issue?

I am trying to find the address for all the clients and then
find the geometrical location based on zip code & zip code extension
from another table (TAB2).
Then insert the address and location into a new table.

The problem is when I lookup client zipcode with TAB2, multiple rows are returned. How can I just take the first row returned by the link?

I tried putting a constraint in the transformation, saying

(DSStageInRowNum = 1) and (link1.zipcde = link2.zipcode)

but it doesn't work.

Thanks for ur timely help.
Amos.Rosmarin
Premium Member
Premium Member
Posts: 385
Joined: Tue Oct 07, 2003 4:55 am

Post by Amos.Rosmarin »

Hi,

If there is no need for a specific zipcode the easyest way is to take TAB2 and unload it into hash file when the client_id is the key.
The hash file will filter the duplicate addresses. Then do the lookup using the hash as reference link.

Anyway it's always a good practice to use hash files for lookups, it's usually much faster then doing a lookup from relational DB.


Amos
Peytot
Participant
Posts: 145
Joined: Wed Jun 04, 2003 7:56 am
Location: France

Post by Peytot »

You can also use a variable in your stage transform when you compare the new value with your last value. One for compare value and the other for keeping the value like old value.

But like Amos said HashFile can be more easy.

Pey
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As said, hash files would be the Best Practice. You'd need to pull all of TAB2 into a hash keyed on (in this case) zip code with whatever data you need to retrieve during the lookup as data fields. Hash files do destructive overwrites, so last one in of each key value wins and is what you would end up looking up. If it mattered which one goes in last, do an 'order by' on your population query so that you get them in the order that you need.

To answer your original question, you don't mention database or stage you are using to look up against it with. I'm guessing ODBC because (from what I recall) it allows a lookup to return multiple rows, other stages don't. I believe there is an option on the stage to return multiple rows or not... have to check when I get to work. Otherwise, you could use 'Custom Sql' in the lookup stage. That would give two benefits - first, you could 'order by' your results so that (if needed) the right row was returned first and second, constrain your query to only return one row with something like "ROW_NUM < 2".

Hope that helps,
-craig

"You can never have too many knives" -- Logan Nine Fingers
trobinson
Participant
Posts: 208
Joined: Thu Apr 11, 2002 6:02 am
Location: Saint Louis
Contact:

Post by trobinson »

By way of preparing everyone for the Brave New World of 7.0, Hash Files are NOT always the faster method. It does not make sense to use a hash file in a real-time application where volume is not a factor. If volume is not a factor here then I would explore using a group function on the SQL of TAB2. This will return a single row. Put max(zipcode) in the derivation or build user defined SQL. Assuming the TAB2 stage will allow it. If volume is a factor then it makes complete sense to load the hash file upfront and then pound through the input rows.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Thanks, Todd - forgot to mention the use of a 'group by' as another option to solve this problem... and for pointing out that Real Time will require new Best Practices.

Speaking of Real Time... well, perhaps I should just start a new topic instead of hijacking this one. :)
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A couple of points to note on this one.

Because updates to hashed files are destructive overwrites, the suggested protocol will yield LAST rather than FIRST.

An Aggregator stage with a set function of First can give the first of each set. Or the extraction SQL can include an appropriate clause (depends on the actual vendor's SQL grammar) such as WHERE ROWNUM = 1 in the inner part of a correlated subquery.

Yet another alternative, and maybe a better one, would be to use Quality Stage to generate a "best of breed" consolidated client record. This approach fits neatly into the real-time requirement as well.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sairam_2003
Participant
Posts: 2
Joined: Thu Nov 13, 2003 3:38 am
Location: NY

Post by sairam_2003 »

Hey...

Thanks a lot guys...

I solved it using SQL override. Shall check for other ways too.
Post Reply