DataStage Parallel Routines and Malloc/New
Moderators: chulett, rschirm, roy
DataStage Parallel Routines and Malloc/New
Hi all,
I'm just about to start experimenting with DataStage Parallel routines. Before working with DataStage, C and C++ were my primary languages of choice and I'm therefore quite comfortable in writing a px routine. However, I've done a bit of reading of the DS manuals and some of the threads in this forum and have found several example px routines which allocate small amounts of temporary memory.
Since these functions are potentially getting called millions of times during execution of a job, how does this allocation affect the overall performance? I would have expected lots of small allocations/deallocations to both reduce performance and seriously fragment memory. Is it better to be using a fast pooled allocator for small temporary allocations? Or is the performance overhead negligible compared to the Orchestrate engine itself?
Can anyone share any thoughts or experiences in this matter?
Cheers,
barry
I'm just about to start experimenting with DataStage Parallel routines. Before working with DataStage, C and C++ were my primary languages of choice and I'm therefore quite comfortable in writing a px routine. However, I've done a bit of reading of the DS manuals and some of the threads in this forum and have found several example px routines which allocate small amounts of temporary memory.
Since these functions are potentially getting called millions of times during execution of a job, how does this allocation affect the overall performance? I would have expected lots of small allocations/deallocations to both reduce performance and seriously fragment memory. Is it better to be using a fast pooled allocator for small temporary allocations? Or is the performance overhead negligible compared to the Orchestrate engine itself?
Can anyone share any thoughts or experiences in this matter?
Cheers,
barry
On the topic of memory allocation within a DataStage px routine, I found this example px routine (thanks DSguru2B)
However, I noticed a couple of things which concerned me and left me wondering how memory is usually allocated within a px routine such as this.
First malloc only allocates a buffer of size large enough to contain a single pointer rather than the entire new string. A normal C program would exhibit strange behavior due to buffer over runs (maybe even cause a seg fault) in this situation, is this an error in the example or is it normal practice for px routines.
Second, how does DS free allocated memory? This example tries to free(result) after the return statement (this would probably get compiled out) so there appears to be a memory leak. Will DataStage free this variable later? Or is there some form of DS allocator to use which handles this transparently?
Thanks,
Barry
Code: Select all
#include "stdio.h"
#include "string.h"
#include "stdlib.h"
char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
char *result = (char *)malloc (sizeof(char *));
int newlen = strlen(rep);
int oldlen = strlen(subStr);
int i, x, count = 0;
//If begining is less than or equal to 1 then default it to 1
if (beg <= 1)
{beg = 1;}
//replace all instances if value of num less than or equal to 0
if (num <= 0)
{num = strlen(str);}
//Get the character position in i for substring instance to start from
for (i = 0; str[i] != '\0' ; i++)
{
if (strstr(&str[i], subStr) == &str[i])
{
count++;
i += oldlen - 1;
if (count == beg)
{break;}
}
}
//Get everything before position i before replacement begins
x = 0;
while (i != x)
{ result[x++] = *str++; }
//Start replacement
while (*str) //for the complete input string
{
if (num != 0 ) // untill no more occurances need to be changed
{
if (strstr(str, subStr) == str )
{
strcpy(&result[x], rep);
x += newlen;
str += oldlen;
num--;
}
else // if no match is found
{
result[x++] = *str++;
}
}
else
{
result[x++] = *str++;
}
}
result[x] = '\0'; //Terminate the string
return result; //Return the replaced string
free(result); //free memory
}
First malloc only allocates a buffer of size large enough to contain a single pointer rather than the entire new string. A normal C program would exhibit strange behavior due to buffer over runs (maybe even cause a seg fault) in this situation, is this an error in the example or is it normal practice for px routines.
Second, how does DS free allocated memory? This example tries to free(result) after the return statement (this would probably get compiled out) so there appears to be a memory leak. Will DataStage free this variable later? Or is there some form of DS allocator to use which handles this transparently?
Thanks,
Barry
malloc() is a pretty fast and efficient call; plus the space is release after the call is complete so no fragmentation occurs due to these calls.
I'm not sure if the implementation of fast pooled memory allocation are portable across UNIX versions, as the calls seem to be different.
It would certainly be worth a try to test the performance over millions of rows, but from past experience I would guess that the incremental time in cpu-ticks of a malloc() will disappear or become quite small when compared with the overhead of the PCL mechanism used to invoke the C++ routine per row.
I'm not sure if the implementation of fast pooled memory allocation are portable across UNIX versions, as the calls seem to be different.
It would certainly be worth a try to test the performance over millions of rows, but from past experience I would guess that the incremental time in cpu-ticks of a malloc() will disappear or become quite small when compared with the overhead of the PCL mechanism used to invoke the C++ routine per row.
Allocating memory with malloc, like the one I allocated for result, does allocate equivalent to the size of a char pointer. As I have written many more px routines after this one, I found out that its better to explicitly specify the size of the input string rather than just the pointer (Had memory overflow issues). So If something encounters problems with my routine, just change the size for result pointer, explicitly.
The return never gets executed, but being a C programmer, I always have free() statements in my main function. So I would say, its habitual.
The px engine frees the memory by itself and hence you will never encounter any memory leaks.
The return never gets executed, but being a C programmer, I always have free() statements in my main function. So I would say, its habitual.
The px engine frees the memory by itself and hence you will never encounter any memory leaks.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Thanks guys for helpful answers!
The the DS Engine will free the returned char array, well that makes life a little easier then. Is there any way I can get DS to out put the generated C code for a given transformer? I'm generally a curious fellow when it comes to things like this and would love to see how the transformer is actually implemented.
Thanks again,
Barry
The the DS Engine will free the returned char array, well that makes life a little easier then. Is there any way I can get DS to out put the generated C code for a given transformer? I'm generally a curious fellow when it comes to things like this and would love to see how the transformer is actually implemented.
Thanks again,
Barry
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Find out the job number.
SELECT NAME, JOBNO FROM DS_JOBS WHERE NAME = '<<Job Name>>';
Look in subdirectory RT_SCnnn (where nnn is the job number) in your project directory on the server for the generated code, generated osh, and scripts to run them.
SELECT NAME, JOBNO FROM DS_JOBS WHERE NAME = '<<Job Name>>';
Look in subdirectory RT_SCnnn (where nnn is the job number) in your project directory on the server for the generated code, generated osh, and scripts to run them.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
If you check back at the original thread, I have posted my bug-fixed version of pxEreplace.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant