Page 1 of 1

Conditional lookup

Posted: Wed Aug 05, 2009 5:31 pm
by sbass1
Hi,

Say my source data has 1M records. 10 records have flag = 1, the remainder have flag = 0.

The two scenarios go down separate update links to the target table.

Flag = 0 is Update existing or insert new.

Flag = 1 is Update existing only.

Problem:

Flag = 1 implies that there is an existing record in the target table (Flag = 1 is a "backdated" record). If there isn't an existing record, I need to raise a warning. DS silently ignores the fact there isn't an existing record when the update action is Update existing only.

Potential solution:

"Manually" code a lookup against the target table. However, the lookup will be against a view and not perform well. I would like to do the lookup only when flag = 1.

My approach:

From an upstream transform, use constraints to split the stream into Backdated and Not_Backdated links. On the Backdated link, use another transform to lookup against the target table. Set a warning flag accordingly based on lookup results. Use the link collector to merge the two streams back together. Load the target table from the merged stream.

To do this, I have to set job properties to enable row buffer, interprocess. I've upped the buffer size from 256 to 1024KB.

Enabling this job property adversely affected performance, but this was the only solution I could think of.

1) Is this the best approach to the problem statement?

2) Are there options (such as increasing the row buffer even more) that I can use to optimize performance?

Thanks,
Scott

Posted: Wed Aug 05, 2009 10:40 pm
by chulett
Hmmm... the only way to implement a 'conditional' lookup is as you've noted - split the stream, do the lookup only on the appropriate rows and then merge the two back together. I, however, never really found it worth the trouble.

A properly set up hashed reference shouldn't be a bottle-neck in a job and thus you really shouldn't have an issue doing the lookup all the time and then validating the results of it only for the flagged records. I'm curious - did you already give that a shot and come to this approach because it wasn't working for you?
sbass1 wrote:DS silently ignores the fact there isn't an existing record when the update action is Update existing only.
For the record, this isn't DS at all but rather the underlying database's responsibility. Some will report an update of a non-existent record as a problem (DB2 does this, I believe) and DataStage will log that. Others (like Oracle) don't consider that a problem (the update completely successfully, it just didn't update any rows) and won't report it as an issue. Not much DataStage can do in that instance.

Posted: Tue Aug 11, 2009 10:46 pm
by sbass1
Hi Craig,

I just assumed looking up for every record, when so few required the lookup, to be non-performant.

So I created two versions: 1) split the stream, lookup only on the relevant stream, recombine via link collector (and requisite interproces buffering), and 2) "seed" two new hashed files from the target table (or view), lookup on all records, and "do the right thing" only if it was a backdated record and lookup failed.

Option #2 performed slightly worse (31 seconds vs. 27 seconds with my test data subset).

Marking this resolved, thanks for the help. Much appreciated.

Scott