Reliability of reference streams

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Tobbi
Participant
Posts: 14
Joined: Fri Oct 31, 2003 9:33 am
Location: Sweden, Stockholm

Reliability of reference streams

Post by Tobbi »

Hi,

in several jobs i've been experiencing "random" cases where I'm doing a lookup on say PORTFOILIO_ID (have some other examples also) and the reference returns a wrong row (having a different portfolio_id). In all cases that I've found the error has happened in two consecutive loads of my DWH.

Anyone experienced this in DS 5.x?
Anyone get this solved in a higher version of DS?

Thx
Tobbi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Can't say that I've ever seen anything like this. :? More details might help...

What are you doing a reference link against - Hash? ODBC? Some native DB stage?
Are you using Custom SQL in your lookups?
How are you checking for success of your lookups?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Tobbi
Participant
Posts: 14
Joined: Fri Oct 31, 2003 9:33 am
Location: Sweden, Stockholm

Post by Tobbi »

I get this error when either quering a Hash or an ORAOCI8 - and in most cases I'm just doing the batch/lookup on one single value, a number.

I received this error about 2 months ago and when analysing the wrote the output from the "Transformer" stage to both the ORAOCI8 object and a sequencial file. Usually this error doesn't appear, but when it appeared the sequencial file looked okay but the ORAOCI8 had inserted wrong values (very strange).

But generally what I look for is out of the ordinary changes in the data, (in the mobil telephony business) e.g. a user switching from MSISDN 1 to MSISDN 2 to MSISDN 3 in the same week, this is practically impossible - requires something like 3-4 weeks minimum in reality.

My guess is that this is some sort of
a) caching/memory issue (I'm still verifying if that is possible - just got an idea last night)
b) The transformer stage is somehow addressing output of the reference stream in a strange way.

as far as I can see it is either of these two options. 4 of us DS developer inhouse have been looking at this for some time and in the simplest part where we found this error we are very sure that this isn't a logical addressing error.

NB: If the error happens in say batch 1005 - I do a roleback and re-run the batch, the error does not reappear (never been able to recreate the error for the same batch). But the same symptoms, different error can occur in the next batch.

The main reason for me asking is that we are planning an upgrade to DS 7 this fall, the question is if this is a bug in 5.x and an upgrade would be the solution or if the whole DS team (mainly myself) are incompetent.

Thanks
Tobbi
Tobbi
Participant
Posts: 14
Joined: Fri Oct 31, 2003 9:33 am
Location: Sweden, Stockholm

Post by Tobbi »

Okay, option a) seems to be the issue (but still can't rule out b) is also an issue).

A question then: some of my jobs do a truncate on a given table - is it possible the DS ignores doing a truncate before inserting rows?

If so, then a) isn't an option really and perhaps b) isn't either because that would mean that I have data for two batches in the same working table.

Anyone?

/Tobbi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Tobbi wrote:A question then: some of my jobs do a truncate on a given table - is it possible the DS ignores doing a truncate before inserting rows?
No. It (the truncate) should also be logged, from what I recall. The only time I've seen people get in trouble with this is when multiple links write to the same Oracle table, the "wrong one" has the truncate on it and it comes too late. :? Not supposed to work that way, but it can. In that situation it is better to do something (typically script-wise) before job to truncate the target table.

Another silly question for you as this has come up more than once in the recent past. Don't recall - does version 5.x support any sort of Row Buffering and do you have it turned on? It can be on in the job itself or at the Project level and the job says to 'Use Project Defaults'. There are times when it can introduce false errors and other manner of unreproducible problems. You can see a discussion of it in this thread for instance.
-craig

"You can never have too many knives" -- Logan Nine Fingers
lebos
Participant
Posts: 74
Joined: Mon Jan 20, 2003 10:02 am
Location: USA

Post by lebos »

Tobbi,

Don't know if any of this will help, but I have some rather unrelated thoughts and questions.

1. Does your reference lookup (Hash) file get updated during the process? If so, is it possible that what you are seeing in the Oracle table is the result of the second (or first) lookup and in the sequential file the first (or second)?

2. What update action do you use for the Oracle table? Is it Insert only, Update only, Update first then insert (Upsert), or what?

3. When restoring and rerunning do you restore both the Oracle table and the reference file? If not, are they possibly getting out of sync?

4. What would happen if you eliminated the link to the Oracle table and used the sequential file as the source to load the table (using another job or transformer stage)? Although this shouldn't be necessary, it might just avoid the problem altogether. You might even separate updates from inserts (if relevant) into different files and load from both as is recommended by "best practices".

Maybe some of this will lead somewhere?

Larry
Tobbi
Participant
Posts: 14
Joined: Fri Oct 31, 2003 9:33 am
Location: Sweden, Stockholm

Post by Tobbi »

Larry, Craig,

interesting point - thanks I'm exploring your suggestions now. The Row Buffering isn't available in 5.x therefore my spontainious thought was - is it on by default in 5.x? .. well, perhaps not.

Hoping it will inspire you to give me even more hints I'll answer your questions:

Truncate: I only have one (input) stream to the Tables I'm truncating

Hashfile: I always do a "clear file" and "delete file before create" - The logic in my jobs requires the hashfiles to be re-loaded every time. Latest simple error does not apprear in hash file context

Insert update order: I have two inputs on the target table, one for update, one for insert. The job where tonights error (54 rows) apperd i do just this: one insert and one update. - this is actually the most troublesome of all the jobs - it's almost exclusivly here the errors (i've found) appear. The update ALWAYS performed (incoming rows) - the insert is performed if a incoming row is a modified version of an existing one.

Restore and re-run: this is done manually - I delete rows from tables but I'm assuming that DS deletes/empties the HASH file each time I re-run the job?? (the "clear file" option). I know that I always get rows into the hashfile - I've heard that if a query returns zero rows then the existing hashfile isn't flushed and old values are reused dispite the "clear file" option (in 5.x at least).

option 4: larry - I'm really looking into something along those lines, thanks for putting it in words :lol:


Details if it helps: I'm working on a User Life Cycle table and in the most troublesom job I pretty much do as follows:
- load incoming users (with relevant changed info)
- allocate dimension keys (Transform stage one)
- for each incoming user, find if he is registered alread in the User Life Cycle table (Transform stage two)
- Insert & Update table ST2_USER_LIFE_CYCLE (ORAOCI8 stage):
* Update incoming user info with dimension keys (always)
* Insert the current record from the DWH_USER_LIFE_CYCLE into ST2_USER_LIFE_CYCLE (when the user already exists)


The idea is to later compare the two rows and register all changes to the user in separate rows with a specific transaction_type_id to track all changes to the users - one job for each transaction_type (e.g. Gross-add, churn, msisdn change, tariff change, address change etc.)

In this very job there are two types of "errors" I get:
1) I do a lookup on the existing user using the source-system unique user key: I get no row in return (although I should) -> the user is registered as NEW with a new dwh internal id -> Gross-adds will increase by one more than they should.

2) I do a lookup on the existing user using the source-system unique user key: I get the wrong row in return (for a different existing user) => a MSISDN change, churn or Address change is registered on the wrong user.

Sounds familiar?

Additional thoughts appreciated - I'll continue with Larrys suggestion or something very similar - feels like an DS7.1 upgrade is approaching much faster than anticipated :?

Thanks guys
Tobbi
Post Reply