Returning char* from external routine

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

timsmith_s
Participant
Posts: 54
Joined: Sun Nov 13, 2005 9:25 pm

Returning char* from external routine

Post by timsmith_s »

I found a few postings regarding this topic, but their C++ syntax was incorrect. The question really is, who owns the memory once I return the char* from the external function? All of the samples and the advanced course material always show constant values being return, but howdo I deal with variable length strings?

For example, the following code returns a buffer from the string, but NEW allocated the buffer, who DELETEs it (free's it), DSEE? I believe it will create a memory leak.

Should I be using the String Class provide in the APT framework API?

#include <string.h>

char* GenerateStringFromParameter (char* StringParameter)
{
char* buffer = 0;

// allocate strong long enough to hold the argument & some text
buffer = new char[strlen(StringParameter) + 12];

// copy the contents of the argument
strcpy(buffer, StringParameter);

// tack on some random text to make string longer than parameter
strcat(buffer, "hello world");

// return the newly created string
return buffer;
}
Yuan_Edward
Participant
Posts: 73
Joined: Tue May 10, 2005 6:21 pm
Location: Sydney

Re: Returning char* from external routine

Post by Yuan_Edward »

I don't think this is the proper way to return a string from a function. The temporary buffer address should not be referred outside the routine. Did you try to compile and run the routine in a C++ program?

In my opinion the buffer address to hold the changed value should be passed into the routine.
timsmith_s wrote:I found a few postings regarding this topic, but their C++ syntax was incorrect. The question really is, who owns the memory once I return the char* from the external function? All of the samples and the advanced course material always show constant values being return, but howdo I deal with variable length strings?

For example, the following code returns a buffer from the string, but NEW allocated the buffer, who DELETEs it (free's it), DSEE? I believe it will create a memory leak.

Should I be using the String Class provide in the APT framework API?

#include <string.h>

char* GenerateStringFromParameter (char* StringParameter)
{
char* buffer = 0;

// allocate strong long enough to hold the argument & some text
buffer = new char[strlen(StringParameter) + 12];

// copy the contents of the argument
strcpy(buffer, StringParameter);

// tack on some random text to make string longer than parameter
strcat(buffer, "hello world");

// return the newly created string
return buffer;
}
Edward Yuan
timsmith_s
Participant
Posts: 54
Joined: Sun Nov 13, 2005 9:25 pm

Post by timsmith_s »

This is not temporary, the NEW operator creates the memory. I pass the address to that memory back via the return type of char*. Its proper C+++, the issue, the my quesiton is, who cleans up the memory created by the NEW operator?
Yuan_Edward
Participant
Posts: 73
Joined: Tue May 10, 2005 6:21 pm
Location: Sydney

Post by Yuan_Edward »

I guess youself should clean up the momory by calling delete in your routine.

I have rewritten your codes in my way :) (i know its not the perfect way):

Code: Select all

#include <string.h> 

char* GenerateStringFromParameter (char* StringParameter, char* buffer) 
{ 
// copy the contents of the argument 
strcpy(buffer, StringParameter); 

// tack on some random text to make string longer than parameter 
strcat(buffer, "hello world"); 

// return the newly created string 
return buffer; 
}
timsmith_s wrote: the issue, the my quesiton is, who cleans up the memory created by the NEW operator?
Edward Yuan
timsmith_s
Participant
Posts: 54
Joined: Sun Nov 13, 2005 9:25 pm

Post by timsmith_s »

Where is the buffer coming from, DataStage? How would I call this from a Transformer?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

In Yuan_Edward code, StringParameter and buffer are input arguments to the function. You need to create an interlude of the function as a parallel routine. Specify the path of the object file and call it just as a routine. DataStage will handle the memory specification for the input arguments. Any variables you use inside is your responsibility or there will be memory leaks.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
timsmith_s
Participant
Posts: 54
Joined: Sun Nov 13, 2005 9:25 pm

Post by timsmith_s »

Interesting - so if I wanted to write my own trim function the signature would read

Trim(instring, outstring) rather than Trim(instring)

Why do the ~standard~ routines not require this sort of hassle? I was told they were handled in much the same way.
Yuan_Edward
Participant
Posts: 73
Joined: Tue May 10, 2005 6:21 pm
Location: Sydney

Post by Yuan_Edward »

Not sure whats inside DataStage. Maybe DataStage uses global memory buffer...maybe DataStage cleans up the momoery area used by the standard routines.

I don't know how and where I can find the C codes DataStage generates for a job. Otherwise we can get the answer from there. :?
timsmith_s wrote:Interesting - so if I wanted to write my own trim function the signature would read

Trim(instring, outstring) rather than Trim(instring)

Why do the ~standard~ routines not require this sort of hassle? I was told they were handled in much the same way.
Edward Yuan
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

If you would write your own Trim() function then you would still use a single variable. THe result variable does not need to be defined. The return statement will take care of it. The variable needs to be defined inside the code. On encountering the return statement the DS Engine releases the memory.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
timsmith_s
Participant
Posts: 54
Joined: Sun Nov 13, 2005 9:25 pm

Post by timsmith_s »

DSguru2B - dont follow. paraphrase: If I still write my own Trim() and I "define my variable" inside of my function - how? This is the heart of the issue.

Using the following code is invalid - it will return an address of "LocalBuffer" which will be out of scope once the function call completes.

char* CopyString(char* SourceString)
{
char LocalBuffer[1024];

strcpy (LocalBuffer, SourceString);

return LocalBuffer;
}
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I would rather define LocalBuffer as a pointer but anyhow. You are right when you say that the scope of this variable is untill the call to the function finishes, but then again thats all that we want. A C function is what a px routine is. Its just a function. For every row the function will be invoked and once the function finishes, the memory is released. So if you have 100 rows in your source file and you are calling the function in the transformer, the function will be called 100 times.
The variable that holds the end result, i.e, is used in the return command, the memory for this variable will be released when the function completes. Any other variable you use inside, make sure you free it within the code. I like to do that.
Check out this amature code that I wrote.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Yuan_Edward
Participant
Posts: 73
Joined: Tue May 10, 2005 6:21 pm
Location: Sydney

Post by Yuan_Edward »

DSguru2B, I pasted your codes here, will the memory buffer (finOut) be released by DataStage (I am quite sure it will not be released by the routine itself)? That's OP's question. DataStage needs to call "free" or "delete" explicitly to release the memory. I cant find any documentation on that.

Maybe I can have a try to allocate a huge amount of memory in my routine and run it again and again, then i can find it out if the server is not getting crashed.

Code: Select all

#include <stdio.h> 
#include <stdlib.h> 
#include <string.h> 

char* SybaseToOracleTp(char* InTp) 
{ 
  //Initialize variables 
  const int SIZE = 30; 
  char* month = (char *)malloc(SIZE); 
  char* day = (char *)malloc(SIZE); 
  char* year = (char *)malloc(SIZE); 
  char* hour = (char *)malloc(SIZE); 
  char* newHr = (char *)malloc(SIZE); 
  char* min = (char *)malloc(SIZE); 
  char* sec = (char *)malloc(SIZE); 
  char* msec = (char *)malloc(SIZE); 
  char* time = (char *)malloc(SIZE); 
  char* intMon = (char *)malloc(SIZE); 
  char* finOut = (char *)malloc(SIZE); 
  const char* calender[] = {"Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"}; 

  int hr = 0; 

  //AM or PM 
  char* p = strstr(InTp, "PM"); 

  //Disect the Date 
  strcpy(month, strtok(InTp, " ")); 
  strcpy(day, strtok(NULL, " ")); 
  strcpy(year, strtok(NULL, " ")); 
  strcpy(time, strtok(NULL," ")); 

 //Disect Time 
  strcpy(hour, strtok(time, ":")); 
  strcpy(min, strtok(NULL, ":")); 
  strcpy(sec, strtok(NULL, ":")); 
  strcpy(msec, strtok(NULL, ":")); 

  //get numeric representation of Month 
  for(int i = 0; i < 12; i++) 
  { 
   if (strcmp(month, calender[i]) == 0) 
     sprintf(intMon, "%02d", i + 1); 
  } 
  if ((p) && strcmp(hour, "12") != 0) 
    { 
     hr = atoi(hour); 
     hr+=12; 
     sprintf(hour, "%02d", hr); 
    } 
  
  if ((!p) && strcmp(hour, "12") == 0) 
  { 
     strcpy(hour, "00"); 
  } 
 //format string to YYYY-MM-DD HH:MM:SS.sss 
 sprintf(finOut, "%s-%2s-%s %s:%s:%s.%s", year, intMon, day, hour, min, sec, msec); 

 //free memory 
 free(month); 
 free(day); 
 free(year); 
 free(hour); 
 free(min); 
 free(sec); 
 free(msec); 
 free(time); 
 free(intMon); 

 return finOut; 

} 
DSguru2B wrote:I would rather define LocalBuffer as a pointer but anyhow. You are right when you say that the scope of this variable is untill the call to the function finishes, but then again thats all that we want. A C function is what a px routine is. Its just a function. For every row the function will be invoked and once the function finishes, the memory is released. So if you have 100 rows in your source file and you are calling the function in the transformer, the function will be called 100 times.
The variable that holds the end result, i.e, is used in the return command, the memory for this variable will be released when the function completes. Any other variable you use inside, make sure you free it within the code. I like to do that.
Check out this amature code that I wrote.
Edward Yuan
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Yuan_Edward wrote:DSguru2B, I pasted your codes here, will the memory buffer (finOut) be released by DataStage (I am quite sure it will not be released by the routine itself)? That's OP's question. DataStage needs to call "free" or "delete" explicitly to release the memory. I cant find any documentation on that.
Yes. DataStage will.
You dont need it to be in writing to believe it. It makes sense.
You cannot free it before returning it, you cannot have a free statement after the return statement as it will never be executed. So the DSEngine takes care of it.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
timsmith_s
Participant
Posts: 54
Joined: Sun Nov 13, 2005 9:25 pm

Post by timsmith_s »

Thank you all for the posts - great insight.

DSguru2B your code worries me - got memory leak written all over it - that said, I am willing to try your approach - its simple enough and I am running out of memory as it is.

I agree with the previous post that the DSEE documentation 1. sucks 2. doesnt describe this behavior - worth a shot.

I have to say that if this works, I will be very disappointed with Ascential/IBM and their total lack of good documentation.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Like where do you think there will be memory leaks? If you are talking about SIZE then thats just something that I kept large enough which I free before quiting the program anyways, so please explain.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply