Page 3 of 4
issie with routine.
Posted: Wed Aug 05, 2009 11:32 am
by krishna81
Hi i tried the above routine and i have an issue.i am ble to process less volune of records(ex:1000) and when i ran with 15 million records it is hang and no warnings.is there any buffer issue in this c program
Posted: Wed Aug 05, 2009 11:35 am
by krishna81
here is the program i tried and it is not working for huge records.i appreciate for any suggestions.
#include "stdio.h"
#include "string.h"
#include "stdlib.h"
char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
char *result = (char *)malloc (sizeof(char *));
int newlen = strlen(rep);
int oldlen = strlen(subStr);
int i, x, count = 0;
//If begining is less than or equal to 1 then default it to 1
if (beg <= 1)
{beg = 1;}
//replace all instances if value of num less than or equal to 0
if (num <= 0)
{num = strlen(str);}
//Get the character position in i for substring instance to start from
for (i = 0; str != '\0' ; i++)
{
if (strstr(&str, subStr) == &str)
{
count++;
i += oldlen - 1;
if (count == beg)
{break;}
}
}
//Get everything before position i before replacement begins
x = 0;
while (i != x)
{ result[x++] = *str++; }
//Start replacement
while (*str) //for the complete input string
{
if (num != 0 ) // untill no more occurances need to be changed
{
if (strstr(str, subStr) == str )
{
strcpy(&result[x], rep);
x += newlen;
str += oldlen;
num--;
}
else // if no match is found
{
result[x++] = *str++;
}
}
else
{
result[x++] = *str++;
}
}
result[x] = '\0'; //Terminate the string
return result; //Return the replaced string
}
Posted: Wed Aug 05, 2009 2:01 pm
by chulett
Suggest you create a new Topic in the PX forum for this.
Posted: Thu Aug 06, 2009 1:44 am
by Sainath.Srinivasan
Initial looks suggest possible memory allocation problem due to malloc.
Try setting that as result[1000] or your maximum string length.
Alternatively you can use another variable at the end to hold the return value and free the result.
Posted: Thu Oct 21, 2010 10:34 am
by DSguru2B
Krishna was kind enough to fix the memory allocation issue with large number of records by incorporating Sainath's suggestion. Click
here for the updated routine.
Ever since I wrote this and moved on to other clients I have'nt gotton a chance to work at an enterprise shop, yet. So could not update the routine based on feed back. I am glad others are taking initiative and making it better.
Posted: Wed May 04, 2011 10:28 am
by Rob4732
Noticed if your replacement string is nothing(""), job ends with a SIGSEGV fatal. You can probably use Trim to remove a string instead though.
thx
Posted: Thu May 05, 2011 9:11 am
by DSguru2B
Rob4732 wrote:Noticed if your replacement string is nothing(""), job ends with a SIGSEGV fatal. You can probably use Trim to remove a string instead though.
thx
Thats true. You would use pxEreplace() for any string replacements. If you want to get rid of a string, replace it with a space and then apply the Convert() function to get rid of the spaces.
Posted: Mon Nov 21, 2011 10:36 am
by PhilHibbs
DSguru2B wrote:Ok, i finally got the chance to complete it.
Pardon me if my C/C++ coding skills are rusty, but I think there is a serious issue with this.
Code: Select all
char *result = (char *)malloc (sizeof(char *));
CMIIW, but that allocates a 4-byte buffer to write the result into, so if the result string is more than 3 characters long then this will overrun the allocated buffer and trash the stack.
To be honest, I can't see a way around this. Even if you allocate enough memory in the routine, you are returning a pointer to a buffer that will never be deallocated, and thus creating a memory leak. If you declare a static buffer, then it is not parallel-safe as every instance of the routine will have the same buffer. Does DataStage provide a hook for allocating memory that it will deallocate correctly afterwards? *Edit* Unless DataStage will always call free() on any char* that is returned in this way?
Posted: Mon Nov 21, 2011 10:38 am
by PhilHibbs
Additionally, I cannot get this test case to work:
Code: Select all
pxEreplace( "TEST AA>BB", "AA", "BB", 0, 1 )
The output of that is "TEST AA>BB", rather than "TEST BB>BB". I can't get any multi-character replacement to work.
This works fine:
Code: Select all
pxEreplace( "TEST P>Q PP>QQ", "P", "Q", 0, 1 )
...and returns this:
*UPDATE* Fixed it! The error is here:
Code: Select all
if (strstr(&str[i], subStr) == &str[i])
{
count++;
i += oldlen - 1;
if (count == beg)
{break;}
}
I changed it to this:
Code: Select all
if (strstr(&str[i], subStr) == &str[i])
{
count++;
if (count == beg)
{break;}
i += oldlen - 1;
}
Also, for performance reasons I replaced all references to strstr with strncmp instead:
Code: Select all
if (strncmp(&str[i], subStr, oldlen) == 0)
Posted: Mon Nov 21, 2011 11:32 am
by PhilHibbs
In fact, here is my complete version including malloc fix:
(code removed, see later post for source that fixes another issue)
Posted: Wed Nov 23, 2011 3:59 am
by PhilHibbs
This routine is now causing my job to abort with this:
APT_CombinedOperatorController(5),1: Operator terminated abnormally: received signal SIGSEGV
APT_CombinedOperatorController(5),0: Operator terminated abnormally: received signal SIGSEGV
Any ideas? I have disabled the main body of the routine in order to rule out a programming error:
Code: Select all
#include "string.h"
#include "stdlib.h"
char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
int buflen = strlen(str)+1;
char *result = (char *)malloc( buflen );
int newlen = strlen(rep);
int oldlen = strlen(subStr);
int i, x, count = 0;
if (result==0) {return 0;}
strcpy(result,str);
return result;
...
*Update* It works fine in a test job that has a row generator that feeds 1000000 rows into a Transformer, that does 4 different replaces including a nested call that replaces two strings:
Code: Select all
pxEreplace( pxEreplace( "AA>BB", ">", "=", 0, 1 ), "AA", "BB", 0, 0 )
Posted: Wed Nov 23, 2011 4:42 am
by PhilHibbs
PhilHibbs wrote:This routine is now causing my job to abort with this:
APT_CombinedOperatorController(5),1: Operator terminated abnormally: received signal SIGSEGV
APT_CombinedOperatorController(5),0: Operator terminated abnormally: received signal SIGSEGV
I think I have found it - it's if the str parameter is an empty string. Not sure why this is, as the function should just sail through and do nothing...
I have worked around by adding this to the start:
Code: Select all
if (buflen==1) {result[0]='\0'; return result;}
Posted: Fri Nov 25, 2011 12:32 pm
by PhilHibbs
DataStage passes a null pointer rather than an empty string. This should fix the problems that that causes:
Code: Select all
/******************************************************************************
* pxEreplace - DataStage parallel routine
*
* Published on DSXchange.com by user DSguru2B
* http://www.dsxchange.com/viewtopic.php?t=106358
*
* Bugs (malloc, realloc, count) fixed by Philip Hibbs, Capgemini
*
* INSTRUCTIONS
*
* 1. Copy the source file pxEreplace.cpp into a directory on the server
* 2. Run the following command:
*
* g++ -O -fPIC -Wno-deprecated -c pxEreplace.cpp
*
* (check Administrator->Properties->Environment->Parallel->Compiler settings)
*
* 3. Copy the output into the DataStage library directory:
*
* cp pxEreplace.o `cat /.dshome`/../PXEngine/lib/pxEreplace.o
*
* 4. Create the Server Routine with the following properties:
*
* Routine Name : pxEreplace
* External subroutine name : pxEreplace
* Type : External function
* Object type : Object
* Return type : char*
* Library path : /software/opt/IBM/InformationServer/Server/PXEngine/lib/pxEreplace.o
* Arguments:
* str I char*
* subStr I char*
* rep I char*
* num I int
* beg I int
*
* Save & Close
*
* Any time that anything changes, you must recompile all jobs that use the routine.
*
******************************************************************************/
#include "string.h"
#include "stdlib.h"
char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
char empty[1]="";
if (!str) {str = empty;}
if (!subStr) {subStr = empty;}
if (!rep) {rep = empty;}
int buflen = strlen(str)+1;
char *result = (char *)malloc( buflen );
if (!result) {return 0;}
if (buflen==1) {result[0]='\0'; return result;}
int oldlen = strlen(subStr);
int newlen = strlen(rep);
int i, x, count = 0;
if (oldlen==0)
{ // special case - insert rep once at the start of the string and return
if (newlen>0)
{
buflen = buflen + newlen;
result = (char *)realloc( result, buflen );
}
strcpy(result, rep);
strcpy(result+newlen, str);
return result;
}
//If begining is less than or equal to 1 then default it to 1
if (beg <= 1)
{beg = 1;}
//replace all instances if value of num less than or equal to 0
if (num <= 0)
{num = buflen;}
//Get the character position in i for substring instance to start from
for (i = 0; str[i] != '\0' ; i++)
{
if (strncmp(&str[i], subStr, oldlen) == 0)
{
count++;
if (count == beg) { break; }
i += oldlen - 1;
}
}
//Get everything before position i before replacement begins
x = 0;
while (i != x)
{ result[x++] = *str++; }
//Start replacement
while (*str) //for the complete input string
{
if (num != 0 ) // untill no more occurances need to be changed
{
if (strncmp(str, subStr, oldlen) == 0)
{
if (newlen > oldlen)
{
buflen = buflen + (newlen - oldlen);
result = (char *)realloc( result, buflen );
}
strcpy(&result[x], rep);
x += newlen;
str += oldlen;
num--;
}
else // if no match is found
{
result[x++] = *str++;
}
}
else
{
result[x++] = *str++;
}
}
result[x] = '\0'; //Terminate the string
return result; //Return the replaced string
}
Posted: Wed Sep 28, 2016 3:21 am
by RiyaNY
Where should I put this code
I want to use this function to replace a string in a transformer, I have no clue how should I go ahead with this code.
Posted: Wed Sep 28, 2016 6:54 am
by chulett
Don't take this the wrong way but if you're not skilled in all of the ways of C++ then this isn't a path for you. IMHO, you'd be better served by starting a new post and letting us know what kind of a 'string problem' you are having. Then we can suggest alternatives.