highest surrogate key

rsunny · Post by **rsunny** » Tue Oct 05, 2010 6:16 pm

ray.wurlod wrote:It would help were you to obtain a premium membership so as to be able to see the totality of Craig's and my replies.

Hi ,

Well i dont have any idea of viewing the replies only if we are premium members only.i was wondering why i couldnt able to view the full message.anyway i will see that i can take the premium membership as soon as possible.

thanks in advance

rsunny · Post by **rsunny** » Thu Oct 07, 2010 12:15 pm

Hi everyone ,

I am having one more issue as when my source has same business key but with different values in other fields.then when i try to implement scd type2 , i couldnt able to update one with current indicator as 'Y' and one with 'N' instead both are inserting with 'Y'.say for example if i have 2 records in source with same business keys , then one has to insert and one has to update but both are inserting , one is not updating.can anyone tell me what might be the issue.i am doing lookup using hash file and not with direct database.

thanks in advance

chulett · Post by **chulett** » Thu Oct 07, 2010 1:58 pm

What 'update action' are you using on the link that should be updating? I'm guessing it's not set to update.

rsunny · Post by **rsunny** » Thu Oct 07, 2010 2:20 pm

chulett wrote:What 'update action' are you using on the link that should be updating? I'm guessing it's not set to update. ...

I didnt understood what you mean , but i am using update 'exisiting rows only' in the target database and in the transformer i am using the update condition which is business keys should match and CRC values doesnt match.for example i have 2 records in the source with same business keys , . if i run the job , one has to be updated and 2 records should be inserted .i.e the first record which is inserted into the target has to be updated to flag as 'N' and insert one more record with flag has 'Y'.but in my case when i run the job with source having 2 records with asme business keys , the two records are inserting ,when i view the data , i can see both the data inserted with flag as 'Y', but it shouldn't happen.can you tell me what might be the issue?

thanks in advance

ray.wurlod · Post by **ray.wurlod** » Thu Oct 07, 2010 3:19 pm

The only possible issue is that you're not setting the flag to 'N' in the expired row during your update process.

rsunny · Post by **rsunny** » Thu Oct 07, 2010 4:11 pm

ray.wurlod wrote:The only possible issue is that you're not setting the flag to 'N' in the expired row during your update process. ...

well , when my source have data where the business keys is unique i.e. different busineess keys for each record so when i run the job it is doing both the updates and inserts based on the look up but the only problem i am finding is when the source having records with same business key i.e. for example 2 records with same business keys , it is just inserting , but it the source has some records where the business keys are unique , then even though if there is any record in the target having same business key with source it is updating and inserting.

thanks in advance

ray.wurlod · Post by **ray.wurlod** » Thu Oct 07, 2010 4:15 pm

There's a problem in your logic somewhere.

Computers only do precisely what they're told to do.

rsunny · Post by **rsunny** » Thu Oct 07, 2010 4:43 pm

Hi ,

can you what should be the logic as when i have same business keys it is not able to do updates ..can anyone tell me what should be the logic in order to get rid of this issue?

thanks in advance

chulett · Post by **chulett** » Fri Oct 08, 2010 7:37 am

OK.

Let me be the first to admit that I am finding it harder and harder to understand what your issues are in this thread or how anyone could possibly help you more than we've already done. Let me make one last stab at it as I am in a giving mood this morning...

In a type 2 dimension, there should only be one record for each business key that is the current record - hence the term. It is typically indicated by a flag set to 'Y', something you apparently have in your target. There can be any number of records for that same business key that are no longer current, i.e. where the 'current indicator' = 'N' and we don't care how many there are because they don't come into play at all in your job. The primary key on each record is either a combination of the business key(s) and the "effective date" of that particular record or a surrogate id that represents that same combination. Are we in agreement so far?

Your first order of business is to prepare a lookup with the appropriate values from any of the current records you may need for the run. In a Server job this means a hashed file keyed by the business key with (at the very least) the effective date or surrogate id as the data element. And by 'current records' I again mean only those where the current indicator is set to 'Y'. If you are doing CDD - Change Data Detection - you'll either need all of the data fields in the hashed file as well or the checksum that represents them that you've either just computed on the fly or (ideally) have as a data element in your target.

Then as your job streams in the current source data, you use the business keys to do a lookup against your hashed file. The first simple check is an existence check. If the record does not exist in the target (and thus the hashed file) you will insert one record into the target for that business key with your default effective date ranges and with the current indicator set to 'Y'. Include all data elements and the new checksum as well, if appropriate.

If you do get a hashed file hit, then you need to either compare all data fields, old versus new, or the two checksums to see if 'something changed'. If there is no change you should simply discard the record, essentially by ensuring the constraints you use on your output links do not allow it to pass. For all of these checks, I would suggest you use well-named stage variables (which could be just one) that indicates 'New', 'Changed' or 'No Change'.

Now all that's left is you've discovered a changed record. You do not, as you've stated earlier, need to "insert two records" but rather you need to insert one record - the new record - and you need to update the current record in the target, the same one you matched against in your hashed file, to set the current ind to 'N' and update the "end effective date" as well. You should have two output links from your transformer to your target, one with full metadata for the insert and one with minimal metadata for the update. The insert you do in either case for the 'new' records is the same, the only difference should be the values you use for the effective date range, which can all be handled by that single "insert" link to your target. The update link will use either the business keys plus the looked up effective date as the actual Key fields to udpate the target & bind into the where clause, that or the looked up surrogate id. All your update should be affecting are any 'metadata' columns in your target that track that change: end effective date, current indicator, updator name / id, etc etc.

So, to recap... two output links, one insert and one update. For a new record only the insert 'fires'. For a change, both fire. Thus, the constraint on the insert link would be some form of 'svNew or svChanged' and the constraint on the update link would be some form of 'svChanged'. I use boolean stage variables for each so my constraints would look exactly as I've posted.

If your updates are not 'sticking' then you are not properly populating or defining the keys or data elements to use in the update. Oracle does not consider an update that updates zero rows to be any kind of an error so if that happens then you need to both notice it and figure out why. The interactive debugger should go a long way towards that goal.

FYI - All of this assume no duplicates in your source data - i.e. there will only be at max one change per business key per run of the job. If that's not true, if you can have multiple changes, that doesn't change all that much. It just means you need to keep the 'current' records in the hashed file updated and you need to process the incoming data in the correct order. You keep them current by, any time you send data down your "insert" link you also send something down a third link that writes to the hashed file. There's more to it, no caching or 'locked for update' in the lookup, but I'm not going to get into that now, this is already way too long and more than I intended to write when I sat down here.

chulett · Post by **chulett** » Fri Oct 08, 2010 8:14 am

Oh, and just for the record - what I posted does not nor is it meant to handle physical deletes from the source system. That requires either a second pass in the opposite direction or a full outer join between the source and the target rather than the technique mentioned above.