Page 1 of 1

Data Quality via DSRoutine

Posted: Tue Jan 30, 2007 1:04 am
by Daddy Doma
(Related topics to this issue have been posted before, and I am familiar with the potential causes of my problem - seeking advice on a specific solution)

The Background:

I have a need to convert invalid data to {Null} throughout my project. My intention is to develop and maintain a single DSRoutine instead of coding the logic into Transformer derivations as required.

My data is full of '_' values, entered as a default by the source system when no valid record is available. I also have less frequent instances of '' (empty string; length=0) and {Null}. Whenever my input has these values I want to make the output a {Null}.

I intend to call the routine inside the Transformer derivation via:

Code: Select all

FuncNullReplace(DSLink1.COLUMN)
A developer on the project has written the FuncNullReplace routine code as:

Code: Select all

char* TrxNullReplace(char* Arg11)
{
   
      char* retValue ;
      retValue = NULL;
      
      if (*Arg11 != *retValue) 
        if (*Arg11 != '_') 
           if (*Arg11 != ' ')
               if (*Arg11 != '?')
                  retValue =  Arg11;
        
      return retValue;
}
The Problem:

1) When I pass an empty string or {Null} to this routine, my job aborts with the error:

Code: Select all

Transformer_1,0: Operator terminated abnormally: received signal SIGSEGV
Searching this forum indicates that my routine is trying to access restricted memory. Alternatively, a developer on the project believes that the logic of the routine itself is wrong.

2) If I keep the empty strings out of the routine via an IF statement and only send records with a value to the routine, I cannot get the routine to return a {Null} value to DataStage - ironically, it returns an empty string!

Does anyone has any advice on how to achieve my desired effect? If possible I want to control the processing in one place and reduce the impact of later changes, rather than code data type transformations in each derivation...

Posted: Tue Jan 30, 2007 1:39 am
by Daddy Doma
Follow up question: what is the performance overhead of implementing this business rule in a BASIC Transformer?

Derivations throughout this project are very simple for the most part, just changing names, some IF statements, and assessing if dates are valid and/or within a parameter-driven range.

It think I can satisfy my desire for a one-stop routine using BASIC, but I don't want to kill my speed - and I will be loading A LOT of records into this EDW...

Posted: Tue Jan 30, 2007 5:54 am
by ray.wurlod
Can you try describing your "empty" strings in code as '\0' ?

You could also implement the logic in a parallel Transformer job, compile the job, then inspect the generated C++ code for ideas about coding your function. (You'll also find, if you're on 7.5.1A or later, that the Transformer stage is quite efficient. You may end up using that rather than incur the overhead of maintaining a routine. As a compromise, you can stick the code in a Build stage and optimize it there, again you won't have the overhead of maintaining an external routine - but you will have the overhead of maintaining your custom stage.

Posted: Tue Jan 30, 2007 7:40 am
by DSguru2B
Explicitly allocate memory for your variable inside the routine and then free it after making the return call.

Code: Select all

char* retValue = (char *)malloc (sizeof(char *));
-----
----
-----
return retValue;
free(retValue)
And as for the unexpected results. I think your need to use strncmp() to compare strings rather than (!=).

Posted: Tue Jan 30, 2007 6:54 pm
by Daddy Doma
Thanks for the responses, guys. Some feedback:
Explicitly allocate memory for your variable inside the routine and then free it after making the return call.
Tried this and the SIGSEGV error is gone!
Can you try describing your "empty" strings in code as '\0' ?
We tried this for both input and output from the routine but it didn't work. However, as stated above, we can now accept the empty strings into the routine by explicitly allocating memory.

The problem I have is that the routine will not return {Null}, only an empty string...
...implement the logic in a parallel Transformer job, compile the job, then inspect the generated C++ code for ideas...
How do I do this? The only time I have seen C++ code from inside DataStage is after a fatal error occurs, by clicking More...

Posted: Tue Jan 30, 2007 7:02 pm
by ray.wurlod
Out of band null is represented in DataStage as 10000000 (binary). You can represent this as 128 in decimal, 200 in octal or 80 in hex. For example \x80