Data Quality via DSRoutine

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Daddy Doma
Premium Member
Premium Member
Posts: 62
Joined: Tue Jun 14, 2005 7:17 pm
Location: Australia
Contact:

Data Quality via DSRoutine

Post by Daddy Doma »

(Related topics to this issue have been posted before, and I am familiar with the potential causes of my problem - seeking advice on a specific solution)

The Background:

I have a need to convert invalid data to {Null} throughout my project. My intention is to develop and maintain a single DSRoutine instead of coding the logic into Transformer derivations as required.

My data is full of '_' values, entered as a default by the source system when no valid record is available. I also have less frequent instances of '' (empty string; length=0) and {Null}. Whenever my input has these values I want to make the output a {Null}.

I intend to call the routine inside the Transformer derivation via:

Code: Select all

FuncNullReplace(DSLink1.COLUMN)
A developer on the project has written the FuncNullReplace routine code as:

Code: Select all

char* TrxNullReplace(char* Arg11)
{
   
      char* retValue ;
      retValue = NULL;
      
      if (*Arg11 != *retValue) 
        if (*Arg11 != '_') 
           if (*Arg11 != ' ')
               if (*Arg11 != '?')
                  retValue =  Arg11;
        
      return retValue;
}
The Problem:

1) When I pass an empty string or {Null} to this routine, my job aborts with the error:

Code: Select all

Transformer_1,0: Operator terminated abnormally: received signal SIGSEGV
Searching this forum indicates that my routine is trying to access restricted memory. Alternatively, a developer on the project believes that the logic of the routine itself is wrong.

2) If I keep the empty strings out of the routine via an IF statement and only send records with a value to the routine, I cannot get the routine to return a {Null} value to DataStage - ironically, it returns an empty string!

Does anyone has any advice on how to achieve my desired effect? If possible I want to control the processing in one place and reduce the impact of later changes, rather than code data type transformations in each derivation...
When you know that you are destined for greatness by virtue of your mutant heritage it is difficult to apply yourself to normal life. Why waste the effort when you know that your potential is so tremendous?
Daddy Doma
Premium Member
Premium Member
Posts: 62
Joined: Tue Jun 14, 2005 7:17 pm
Location: Australia
Contact:

Post by Daddy Doma »

Follow up question: what is the performance overhead of implementing this business rule in a BASIC Transformer?

Derivations throughout this project are very simple for the most part, just changing names, some IF statements, and assessing if dates are valid and/or within a parameter-driven range.

It think I can satisfy my desire for a one-stop routine using BASIC, but I don't want to kill my speed - and I will be loading A LOT of records into this EDW...
When you know that you are destined for greatness by virtue of your mutant heritage it is difficult to apply yourself to normal life. Why waste the effort when you know that your potential is so tremendous?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can you try describing your "empty" strings in code as '\0' ?

You could also implement the logic in a parallel Transformer job, compile the job, then inspect the generated C++ code for ideas about coding your function. (You'll also find, if you're on 7.5.1A or later, that the Transformer stage is quite efficient. You may end up using that rather than incur the overhead of maintaining a routine. As a compromise, you can stick the code in a Build stage and optimize it there, again you won't have the overhead of maintaining an external routine - but you will have the overhead of maintaining your custom stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Explicitly allocate memory for your variable inside the routine and then free it after making the return call.

Code: Select all

char* retValue = (char *)malloc (sizeof(char *));
-----
----
-----
return retValue;
free(retValue)
And as for the unexpected results. I think your need to use strncmp() to compare strings rather than (!=).
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Daddy Doma
Premium Member
Premium Member
Posts: 62
Joined: Tue Jun 14, 2005 7:17 pm
Location: Australia
Contact:

Post by Daddy Doma »

Thanks for the responses, guys. Some feedback:
Explicitly allocate memory for your variable inside the routine and then free it after making the return call.
Tried this and the SIGSEGV error is gone!
Can you try describing your "empty" strings in code as '\0' ?
We tried this for both input and output from the routine but it didn't work. However, as stated above, we can now accept the empty strings into the routine by explicitly allocating memory.

The problem I have is that the routine will not return {Null}, only an empty string...
...implement the logic in a parallel Transformer job, compile the job, then inspect the generated C++ code for ideas...
How do I do this? The only time I have seen C++ code from inside DataStage is after a fatal error occurs, by clicking More...
When you know that you are destined for greatness by virtue of your mutant heritage it is difficult to apply yourself to normal life. Why waste the effort when you know that your potential is so tremendous?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Out of band null is represented in DataStage as 10000000 (binary). You can represent this as 128 in decimal, 200 in octal or 80 in hex. For example \x80
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply