Transform not recognizing change to one particular field

rdy · Post by **rdy** » Wed Jul 14, 2004 6:50 am

We're having a really tricky problem. Our daily processes updates Table B to stay identical to Table A.

One of our server jobs compares the current contents of Table A to the existing contents of Table B.

In this job, Table A's contents come in to a transform as a sequential file, and Table B's contents are in a hash file fed from an ODBC stage - this is the reference link.

In the transform, the keys are matched on the input link and reference link. The one output is the file with all the records that changed.

The constraint that selects records is as follows:
(NullToEmpty(existingRecs.field_1) <> NullToEmpty(lkupDBRecs.field_1)) OR
(NullToEmpty(existingRecs.field_2) <> NullToEmpty(lkupDBRecs.field_2)) OR
(NullToEmpty(existingRecs.test_code_2) <> NullToEmpty(lkupDBRecs.test_code_2)) OR
(NullToEmpty(existingRecs.field_3) <> NullToEmpty(lkupDBRecs.field_3)) OR
(NullToEmpty(existingRecs.test_code_3) <> NullToEmpty(lkupDBRecs.test_code_3)) OR
(NullToEmpty(existingRecs.field_4) <> NullToEmpty(lkupDBRecs.field_4)) OR
(NullToEmpty(existingRecs.field_5) <> NullToEmpty(lkupDBRecs.field_5)) OR
(NullToEmpty(existingRecs.field_6) <> NullToEmpty(lkupDBRecs.field_6)) OR
(NullToEmpty(existingRecs.field_7) <> NullToEmpty(lkupDBRecs.field_7)) OR
(NullToEmpty(existingRecs.field_8) <> NullToEmpty(lkupDBRecs.field_8)) OR
(NullToEmpty(existingRecs.first_name) <> NullToEmpty(lkupDBRecs.first_name)) OR
(NullToEmpty(existingRecs.last_name) <> NullToEmpty(lkupDBRecs.last_name)) OR
(NullToEmpty(existingRecs.email_address) <> NullToEmpty(lkupDBRecs.email_address)) OR
(NullToEmpty(existingRecs.phone_free_form_number) <> NullToEmpty(lkupDBRecs.phone_free_form_number)) OR
(NullToEmpty(existingRecs.field_9) <> NullToEmpty(lkupDBRecs.field_9)) OR
(NullToEmpty(Trim(existingRecs.test_code)) <> NullToEmpty(Trim(lkupDBRecs.test_code)))

You can see it compares 16 fields.

My problem is that for certain records a change in one of these fields (e.g. field_2) will fulfill the constraint, but a change to test_code does not.

To make matters worse, a change to test_code in some records fulfills the constraint, but in other records it doesn't!

As you can see, this is totally unpredictable and very frustrating. Is there some kind of overflow or limit or something that could cause this behavior?

chucksmith · Post by **chucksmith** » Wed Jul 14, 2004 7:34 am

This may be very predictable behavior. When comparing two columns, DataStage first attempts a numeric comparison. Therefore, the string "00" and "000" will appear equal since they both represent number zero. For your non-numeric columns, use the Compare function. This will force a string comparison.

Since you are on a Unix server, you may want to look at the Unix comm command. If you dump table A to a sequential file, and table B to another sequential file, then unix sort the two files, the comm command can quickly give you a file of lines (1) unique to file A, (2) unique to file B, or (3) common to both files. This may efficiently reduce the work your job has to perform, or even eliminate it.

rdy · Post by **rdy** » Wed Jul 14, 2004 8:06 am

chucksmith wrote:For your non-numeric columns, use the Compare function. This will force a string comparison.

OK, I tried this an it looks initially promising. Just testing on the one field, I changed my constraint to:

Compare(NullToEmpty(Trim(existingRecs.test_code)),NullToEmpty(Trim(lkupDBRecs.test_code)))

And it flags the changed recs.

Question:

Should I use this format instead?

Compare(NullToEmpty(Trim(existingRecs.test_code)),NullToEmpty(Trim(lkupDBRecs.test_code)))<>0

From the DataStage manual, it says that Compare will return -1 if string1 is less than string2, and 1 if string1 is greater than string2.

Does the DataStage constraint check for any non-zero return code? If so, the "<>0" at the end probably isn't necessary, right?

lebos · Post by **lebos** » Wed Jul 14, 2004 8:18 am

Wouldn't it be simpler to define your reference hash file so that all columns are keys? Then the constraint could simply be lkupDBRecs.NotFound. This would select all input records which do not exactly match a record in the hash file.

This is how we do it. Is there some drawback to this technique?

Larry

chucksmith · Post by **chucksmith** » Wed Jul 14, 2004 9:16 am

I would use the Compare( ... ) <> 0 format.

rdy · Post by **rdy** » Wed Jul 14, 2004 11:41 am

Thanks both of you for your help.

I gave a quick try using all fields as keys and it said ALL the records were not found. Probably has to do with some trimming or nulls, I'll have to take a look at that in the future.

I'll go with the Compare <> 0 for now.

DSXchange

Transform not recognizing change to one particular field

Transform not recognizing change to one particular field

Thanks for help...