Hi
I am new to DataStage... can someone help with my issue?
I am trying to find the address for all the clients and then
find the geometrical location based on zip code & zip code extension
from another table (TAB2).
Then insert the address and location into a new table.
The problem is when I lookup client zipcode with TAB2, multiple rows are returned. How can I just take the first row returned by the link?
I tried putting a constraint in the transformation, saying
(DSStageInRowNum = 1) and (link1.zipcde = link2.zipcode)
but it doesn't work.
Thanks for ur timely help.
Retrieve First Row
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 2
- Joined: Thu Nov 13, 2003 3:38 am
- Location: NY
-
- Premium Member
- Posts: 385
- Joined: Tue Oct 07, 2003 4:55 am
Hi,
If there is no need for a specific zipcode the easyest way is to take TAB2 and unload it into hash file when the client_id is the key.
The hash file will filter the duplicate addresses. Then do the lookup using the hash as reference link.
Anyway it's always a good practice to use hash files for lookups, it's usually much faster then doing a lookup from relational DB.
Amos
If there is no need for a specific zipcode the easyest way is to take TAB2 and unload it into hash file when the client_id is the key.
The hash file will filter the duplicate addresses. Then do the lookup using the hash as reference link.
Anyway it's always a good practice to use hash files for lookups, it's usually much faster then doing a lookup from relational DB.
Amos
As said, hash files would be the Best Practice. You'd need to pull all of TAB2 into a hash keyed on (in this case) zip code with whatever data you need to retrieve during the lookup as data fields. Hash files do destructive overwrites, so last one in of each key value wins and is what you would end up looking up. If it mattered which one goes in last, do an 'order by' on your population query so that you get them in the order that you need.
To answer your original question, you don't mention database or stage you are using to look up against it with. I'm guessing ODBC because (from what I recall) it allows a lookup to return multiple rows, other stages don't. I believe there is an option on the stage to return multiple rows or not... have to check when I get to work. Otherwise, you could use 'Custom Sql' in the lookup stage. That would give two benefits - first, you could 'order by' your results so that (if needed) the right row was returned first and second, constrain your query to only return one row with something like "ROW_NUM < 2".
Hope that helps,
To answer your original question, you don't mention database or stage you are using to look up against it with. I'm guessing ODBC because (from what I recall) it allows a lookup to return multiple rows, other stages don't. I believe there is an option on the stage to return multiple rows or not... have to check when I get to work. Otherwise, you could use 'Custom Sql' in the lookup stage. That would give two benefits - first, you could 'order by' your results so that (if needed) the right row was returned first and second, constrain your query to only return one row with something like "ROW_NUM < 2".
Hope that helps,
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
By way of preparing everyone for the Brave New World of 7.0, Hash Files are NOT always the faster method. It does not make sense to use a hash file in a real-time application where volume is not a factor. If volume is not a factor here then I would explore using a group function on the SQL of TAB2. This will return a single row. Put max(zipcode) in the derivation or build user defined SQL. Assuming the TAB2 stage will allow it. If volume is a factor then it makes complete sense to load the hash file upfront and then pound through the input rows.
Thanks, Todd - forgot to mention the use of a 'group by' as another option to solve this problem... and for pointing out that Real Time will require new Best Practices.
Speaking of Real Time... well, perhaps I should just start a new topic instead of hijacking this one.![Smile :)](./images/smilies/icon_smile.gif)
Speaking of Real Time... well, perhaps I should just start a new topic instead of hijacking this one.
![Smile :)](./images/smilies/icon_smile.gif)
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
A couple of points to note on this one.
Because updates to hashed files are destructive overwrites, the suggested protocol will yield LAST rather than FIRST.
An Aggregator stage with a set function of First can give the first of each set. Or the extraction SQL can include an appropriate clause (depends on the actual vendor's SQL grammar) such as WHERE ROWNUM = 1 in the inner part of a correlated subquery.
Yet another alternative, and maybe a better one, would be to use Quality Stage to generate a "best of breed" consolidated client record. This approach fits neatly into the real-time requirement as well.
Because updates to hashed files are destructive overwrites, the suggested protocol will yield LAST rather than FIRST.
An Aggregator stage with a set function of First can give the first of each set. Or the extraction SQL can include an appropriate clause (depends on the actual vendor's SQL grammar) such as WHERE ROWNUM = 1 in the inner part of a correlated subquery.
Yet another alternative, and maybe a better one, would be to use Quality Stage to generate a "best of breed" consolidated client record. This approach fits neatly into the real-time requirement as well.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 2
- Joined: Thu Nov 13, 2003 3:38 am
- Location: NY