Retruning strings (char pointers) from C routines

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
banactp
Participant
Posts: 52
Joined: Tue Feb 22, 2005 2:55 pm

Retruning strings (char pointers) from C routines

Post by banactp »

I would like to create a PX routine that returns a (modified) substring of an input string, much like this one posted by another user of this forum:

Code: Select all

char *substr(char *str,int st,int len)
{
   int i,j=0,l;
   char *s;
   l = strlen(str);
   if(len > 0)
      s = (char *)malloc(100);
   else
      return("");
   if(l < abs(st))
      return("");
   if(st == 0)
      st = 1;
   if(st < 0)
      st = st + l + 1;
   for(i=st-1; i<st+len-1; i++)
   {
      s[j] = str[i];
      j++;         
   }
   s[j] = '\0';
   return(s);
}
My concern is how to free the space malloc'ed in the routine. I do not want to modify or overwrite the input string.

One idea that I've had, as yet untested, is to pass a VarChar stage variable as an additional (char *) argument, and try to put the result there instead of allocating new storage.

Any other suggestions?

TIA
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
well I'd like to know the answer to this one myself, this kind of thing is what making me use some routines still with basic so I can use the common storage and not worry about memory leaks.

Any ideas anyone?
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

hmm it was long ago since I played with c/c++.
but if my memory serves me this code simply allocates 100 bytes on the heap and returns a pointer to that location in the memory or rather a pointer to a char string of 100 characters in max length.
now if my assumption is correct and the allocated size is static you might use a static varialbe without the dynamic allocation and use Stage variables to hold the last value of it (returned from the routine) so you can send it in the next row as a starting point.
if this is the case you have no memory leak issues.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
banactp
Participant
Posts: 52
Joined: Tue Feb 22, 2005 2:55 pm

Post by banactp »

Thanks, Roy.

Static allocation won't work becuase this is intended to be a common, widely-used routine. I can't risk having several processes trying to use the same space at once. And I don't want to lock it down and have threads waiting on such a simple routine!

I'll be able to try using the space allocated for a stage variable this afternoon, once our admin reconfigures our installation. I'll post the results here.

I was under the impression that I couldn't use BASIC (server) routines within parallel jobs. In fact, I think I tried that two days ago and sure enough the routine did not appear in the pallete of available ones.

tpb
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

true in a parallel transformer but you can use a basic transformer from the stage types in your repository
then you have basic in your parallel jobs.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Re: Retruning strings (char pointers) from C routines

Post by T42 »

banactp wrote:I would like to create a PX routine that returns a (modified) substring of an input string, much like this one posted by another user of this forum
Modified substring? That can easily be done in Transformer stage. Give us your rules, and I can whip something up without you having to build a BuildOP or Custom stage.
banactp
Participant
Posts: 52
Joined: Tue Feb 22, 2005 2:55 pm

Re: Retruning strings (char pointers) from C routines

Post by banactp »

T42 wrote: Modified substring? That can easily be done in Transformer stage. Give us your rules, and I can whip something up without you having to build a BuildOP or Custom stage.
OK, I need a stage that processes records in which some of the fields are strings. Some, but not all of those strings need to be flipped around from front to back, and all the "<" and ">" should be replaced with "<" and "&gt". Any ":" should be removed, then the number of characters in the string should be appended to it along with a date/timestamp. The fields in each record that need to be operated on will vary from job to job, although the processing will be the same in each case.

Well, that's not exactly what I'm trying to do, but it gives you an idea of why I would like to have a reusable routine that returns a char * that I can use in Transformers. BTW, I know this type of processing can be done with a Transformer, but boy is it ugly to implement it.

:? Now, if I could only get my custom routine to link into DataStage correctly (viewtopic.php?p=126171#126171) , I think the problem would be solved!
mbruenen
Participant
Posts: 1
Joined: Mon Jan 31, 2005 2:51 am

Static allocations for returned results in C routines

Post by mbruenen »

Hi,

this is precisely the kind of problem that hit us recently.

Former developers had installed C++ routines using never-to-be-freed-mallocs in order to return string results.

Unfortunately, the problem did only become apparent before we installed DS version 7.5. Quite at a sudden we experienced heap overflows for large input data sets.

To amend the problem we used static allocations for the result strings, returning pointers to it. At least, this made our jobs run correctly again.

I've never checked the job's parallel behaviour since then, but I think I can't agree to the opinion that static allocation will prevent the job from running in parallel.

If DS chooses to run a job in let's say two threads, the entire (!) address space has to be provided twice in memory. That means that code plus address spaces for routines have to be doubled as well.
I consider this true regardless whether the routines are binded statically or as dynamic libs.
Otherwise, the kind of argument you have against would as well hold for any routines returning any kind of data type (returned integers also have to be declared in the routines, which in turn means that 4 bytes are allocated for them).

So, as a bottomline: static allocation should work perfectly for the kind of task you have in mind (taken for granted that you know in advance how many bytes you will need worst case).

Nevertheless, I have no proof available for my claim. If anyone has a DS job which proves that user written C++ routines effectively force parallelism to be turned off, I would be very grateful to hear about that.

Thank you very much.
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

Do not use the standard C libraries.

Use the Orchestrate API libraries. APT_String, et cetera. This will help a lot on garbage collections.

Bug Ascential for a copy of the Orchestrate manuals.
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi,

You dont have to bug Ascential for orchestrate documents. We had a discussion on this few days back and one of the posts provided a link which connects directly to Ascential website for orchestrate docs.

Here it is.

http://ascential.com/eservice/pages/pro ... trate.html

HTH
Rich
Ramani
Participant
Posts: 58
Joined: Mon Oct 08, 2007 1:51 am

Post by Ramani »

Orchestrate API link provided is no more active. If any one has a new link, please provide.
Thanks
Post Reply