Transform not recognizing change to one particular field

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
rdy
Participant
Posts: 38
Joined: Wed Nov 05, 2003 2:40 pm

Transform not recognizing change to one particular field

Post by rdy »

We're having a really tricky problem. Our daily processes updates Table B to stay identical to Table A.

One of our server jobs compares the current contents of Table A to the existing contents of Table B.

In this job, Table A's contents come in to a transform as a sequential file, and Table B's contents are in a hash file fed from an ODBC stage - this is the reference link.

In the transform, the keys are matched on the input link and reference link. The one output is the file with all the records that changed.

The constraint that selects records is as follows:
(NullToEmpty(existingRecs.field_1) <> NullToEmpty(lkupDBRecs.field_1)) OR
(NullToEmpty(existingRecs.field_2) <> NullToEmpty(lkupDBRecs.field_2)) OR
(NullToEmpty(existingRecs.test_code_2) <> NullToEmpty(lkupDBRecs.test_code_2)) OR
(NullToEmpty(existingRecs.field_3) <> NullToEmpty(lkupDBRecs.field_3)) OR
(NullToEmpty(existingRecs.test_code_3) <> NullToEmpty(lkupDBRecs.test_code_3)) OR
(NullToEmpty(existingRecs.field_4) <> NullToEmpty(lkupDBRecs.field_4)) OR
(NullToEmpty(existingRecs.field_5) <> NullToEmpty(lkupDBRecs.field_5)) OR
(NullToEmpty(existingRecs.field_6) <> NullToEmpty(lkupDBRecs.field_6)) OR
(NullToEmpty(existingRecs.field_7) <> NullToEmpty(lkupDBRecs.field_7)) OR
(NullToEmpty(existingRecs.field_8) <> NullToEmpty(lkupDBRecs.field_8)) OR
(NullToEmpty(existingRecs.first_name) <> NullToEmpty(lkupDBRecs.first_name)) OR
(NullToEmpty(existingRecs.last_name) <> NullToEmpty(lkupDBRecs.last_name)) OR
(NullToEmpty(existingRecs.email_address) <> NullToEmpty(lkupDBRecs.email_address)) OR
(NullToEmpty(existingRecs.phone_free_form_number) <> NullToEmpty(lkupDBRecs.phone_free_form_number)) OR
(NullToEmpty(existingRecs.field_9) <> NullToEmpty(lkupDBRecs.field_9)) OR
(NullToEmpty(Trim(existingRecs.test_code)) <> NullToEmpty(Trim(lkupDBRecs.test_code)))

You can see it compares 16 fields.

My problem is that for certain records a change in one of these fields (e.g. field_2) will fulfill the constraint, but a change to test_code does not.

To make matters worse, a change to test_code in some records fulfills the constraint, but in other records it doesn't!

As you can see, this is totally unpredictable and very frustrating. Is there some kind of overflow or limit or something that could cause this behavior?
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

This may be very predictable behavior. When comparing two columns, DataStage first attempts a numeric comparison. Therefore, the string "00" and "000" will appear equal since they both represent number zero. For your non-numeric columns, use the Compare function. This will force a string comparison.

Since you are on a Unix server, you may want to look at the Unix comm command. If you dump table A to a sequential file, and table B to another sequential file, then unix sort the two files, the comm command can quickly give you a file of lines (1) unique to file A, (2) unique to file B, or (3) common to both files. This may efficiently reduce the work your job has to perform, or even eliminate it.
rdy
Participant
Posts: 38
Joined: Wed Nov 05, 2003 2:40 pm

Post by rdy »

chucksmith wrote:For your non-numeric columns, use the Compare function. This will force a string comparison.
OK, I tried this an it looks initially promising. Just testing on the one field, I changed my constraint to:

Compare(NullToEmpty(Trim(existingRecs.test_code)),NullToEmpty(Trim(lkupDBRecs.test_code)))

And it flags the changed recs.

Question:

Should I use this format instead?

Compare(NullToEmpty(Trim(existingRecs.test_code)),NullToEmpty(Trim(lkupDBRecs.test_code)))<>0

From the DataStage manual, it says that Compare will return -1 if string1 is less than string2, and 1 if string1 is greater than string2.

Does the DataStage constraint check for any non-zero return code? If so, the "<>0" at the end probably isn't necessary, right?
lebos
Participant
Posts: 74
Joined: Mon Jan 20, 2003 10:02 am
Location: USA

Post by lebos »

Wouldn't it be simpler to define your reference hash file so that all columns are keys? Then the constraint could simply be lkupDBRecs.NotFound. This would select all input records which do not exactly match a record in the hash file.

This is how we do it. Is there some drawback to this technique?

Larry
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

I would use the Compare( ... ) <> 0 format.
rdy
Participant
Posts: 38
Joined: Wed Nov 05, 2003 2:40 pm

Thanks for help...

Post by rdy »

Thanks both of you for your help.

I gave a quick try using all fields as keys and it said ALL the records were not found. Probably has to do with some trimming or nulls, I'll have to take a look at that in the future.

I'll go with the Compare <> 0 for now.
Post Reply