Reference Table - Logical Key

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
aravindk
Participant
Posts: 41
Joined: Tue Dec 09, 2003 8:59 am

Reference Table - Logical Key

Post by aravindk »

Hello all,

I have a table which contains unique addresses. For the sake of simplicity, let me assume that there are only three fields in this table. LocationAdd1, LocationCity, IDX. This table is keyed by Identifier (sequential no) field IDX. I am loading this table with 'Update existing rows or insert new rows' option set. If the row doesnt exist, I am generating the seq. no and writing the new row.

My Questions are:

a) If that row (LocationAdd1 + LocationCity) exists, I would like to retrieve its seq no. Can I retrieve it without setting up a reference stage?

b) Also, If I have to set a reference stage, then can I mark the LocationAdd1 and LocationCity as Keys in my Stage Columns, even though the key of the underlying physical table is actually IDX. Will this return the IDX - if the row exists?

The database is in Oracle and I am using Oracle 8i Stage.

TIA,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

a) Nope. This would be a good place for a hash, I would think. It would minimize the pain of the reference lookup. You could also then ditch the whole 'update then insert' action.

b) Sure! All marking a field in the OCI stage does is force it into the 'where' clause. Has no bearing on actual indexes or whatnot on the underlying table except to control how performant it would be. Make sure your choice of fields returns only one row, however.
-craig

"You can never have too many knives" -- Logan Nine Fingers
aravindk
Participant
Posts: 41
Joined: Tue Dec 09, 2003 8:59 am

Post by aravindk »

a) Few reasons why I am not using Hash. i) this is our first project and i dont want to start on hash files right away. ii) The tables I am working currently are not that that big - about 1000 records at the most. iii) This is a migration project and in most cases the record dont exist in the target database. So performance is not my top priority right now.

b) This did work. Many thanks for the quick reply.


Regards,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

aravindk wrote:a) Few reasons why I am not using Hash. i) this is our first project and i dont want to start on hash files right away. ii) The tables I am working currently are not that that big - about 1000 records at the most. iii) This is a migration project and in most cases the record dont exist in the target database. So performance is not my top priority right now.
Hash files are a fundamental part of Server jobs. Don't wait too long before you start using them. :wink:

Also, if in most cases the records do not exist in your target then you should switch your update action to 'Insert then Update'.

Some day, when performance does matter, it's things like that (and hash lookups) that will help you out.
-craig

"You can never have too many knives" -- Logan Nine Fingers
raju_chvr
Premium Member
Premium Member
Posts: 165
Joined: Sat Sep 27, 2003 9:19 am
Location: USA

Post by raju_chvr »

AravindK,

As Craig has already mentioned, Hash files are fundamental to Server jobs. Now you may get along with out using Hash files but very soon when your data gets bigger and bigger you will have to re-design the jobs again. Avoid doing that by planning and designing the jobs right now itself. It will pay off in the end.
aravindk
Participant
Posts: 41
Joined: Tue Dec 09, 2003 8:59 am

Post by aravindk »

I do understand the benefits that hash files offer and I am hoping to start using it as early as next week.

About the 'insert after update' you mentioned, What I am doing currently is I have a reference table to lookup for the existence of the row. I have two Output OCIs setup for the same table one for Update and another for Insert. Constraints are set on both the links. Update link will be processed only if the key returned from the reference link is non-null. Insert link will be processed when the ref link returns a null value. I settled for this logic after reading one of the postings in this forum. Following your advise, I am going to move insert link before the update link.

Thanks,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sorry, but it sounds like I misunderstood. There is a specific "Update action" of "Update existing rows or insert new rows" and I thought that was what you were referring to. That one and its companion "Insert new rows or update existing rows" should be used properly - meaning try whichever one will succeed the most often first.

If you are doing a reference lookup with separate links for each action, it might help a little bit to order them that way. I was more concerned about the above situation where it can make a great deal of difference if you get it the "wrong way round".
-craig

"You can never have too many knives" -- Logan Nine Fingers
aravindk
Participant
Posts: 41
Joined: Tue Dec 09, 2003 8:59 am

Post by aravindk »

When I first posted the message, I had one Output link with Update action set as 'Update existing rows and Insert new rows', but could not get it to retrieve the IDX for rows that already exist. As soon as I post, I saw this other posting where it was suggested that we can setup two Output links, one to update and one to insert. I changed my code to do exactly the same. I should have mentioned the change in my subsequent reply, but i didnt. The mistake was mine, not yours.

Regards,
Post Reply