px version of Ereplace()
Moderators: chulett, rschirm, roy
issie with routine.
Hi i tried the above routine and i have an issue.i am ble to process less volune of records(ex:1000) and when i ran with 15 million records it is hang and no warnings.is there any buffer issue in this c program
Datastage User
here is the program i tried and it is not working for huge records.i appreciate for any suggestions.
#include "stdio.h"
#include "string.h"
#include "stdlib.h"
char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
char *result = (char *)malloc (sizeof(char *));
int newlen = strlen(rep);
int oldlen = strlen(subStr);
int i, x, count = 0;
//If begining is less than or equal to 1 then default it to 1
if (beg <= 1)
{beg = 1;}
//replace all instances if value of num less than or equal to 0
if (num <= 0)
{num = strlen(str);}
//Get the character position in i for substring instance to start from
for (i = 0; str != '\0' ; i++)
{
if (strstr(&str, subStr) == &str)
{
count++;
i += oldlen - 1;
if (count == beg)
{break;}
}
}
//Get everything before position i before replacement begins
x = 0;
while (i != x)
{ result[x++] = *str++; }
//Start replacement
while (*str) //for the complete input string
{
if (num != 0 ) // untill no more occurances need to be changed
{
if (strstr(str, subStr) == str )
{
strcpy(&result[x], rep);
x += newlen;
str += oldlen;
num--;
}
else // if no match is found
{
result[x++] = *str++;
}
}
else
{
result[x++] = *str++;
}
}
result[x] = '\0'; //Terminate the string
return result; //Return the replaced string
}
#include "stdio.h"
#include "string.h"
#include "stdlib.h"
char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
char *result = (char *)malloc (sizeof(char *));
int newlen = strlen(rep);
int oldlen = strlen(subStr);
int i, x, count = 0;
//If begining is less than or equal to 1 then default it to 1
if (beg <= 1)
{beg = 1;}
//replace all instances if value of num less than or equal to 0
if (num <= 0)
{num = strlen(str);}
//Get the character position in i for substring instance to start from
for (i = 0; str != '\0' ; i++)
{
if (strstr(&str, subStr) == &str)
{
count++;
i += oldlen - 1;
if (count == beg)
{break;}
}
}
//Get everything before position i before replacement begins
x = 0;
while (i != x)
{ result[x++] = *str++; }
//Start replacement
while (*str) //for the complete input string
{
if (num != 0 ) // untill no more occurances need to be changed
{
if (strstr(str, subStr) == str )
{
strcpy(&result[x], rep);
x += newlen;
str += oldlen;
num--;
}
else // if no match is found
{
result[x++] = *str++;
}
}
else
{
result[x++] = *str++;
}
}
result[x] = '\0'; //Terminate the string
return result; //Return the replaced string
}
Datastage User
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Krishna was kind enough to fix the memory allocation issue with large number of records by incorporating Sainath's suggestion. Click here for the updated routine.
Ever since I wrote this and moved on to other clients I have'nt gotton a chance to work at an enterprise shop, yet. So could not update the routine based on feed back. I am glad others are taking initiative and making it better.
Ever since I wrote this and moved on to other clients I have'nt gotton a chance to work at an enterprise shop, yet. So could not update the routine based on feed back. I am glad others are taking initiative and making it better.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Thats true. You would use pxEreplace() for any string replacements. If you want to get rid of a string, replace it with a space and then apply the Convert() function to get rid of the spaces.Rob4732 wrote:Noticed if your replacement string is nothing(""), job ends with a SIGSEGV fatal. You can probably use Trim to remove a string instead though.
thx
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Pardon me if my C/C++ coding skills are rusty, but I think there is a serious issue with this.DSguru2B wrote:Ok, i finally got the chance to complete it.
Code: Select all
char *result = (char *)malloc (sizeof(char *));
To be honest, I can't see a way around this. Even if you allocate enough memory in the routine, you are returning a pointer to a buffer that will never be deallocated, and thus creating a memory leak. If you declare a static buffer, then it is not parallel-safe as every instance of the routine will have the same buffer. Does DataStage provide a hook for allocating memory that it will deallocate correctly afterwards? *Edit* Unless DataStage will always call free() on any char* that is returned in this way?
Last edited by PhilHibbs on Mon Nov 21, 2011 11:08 am, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Additionally, I cannot get this test case to work:
The output of that is "TEST AA>BB", rather than "TEST BB>BB". I can't get any multi-character replacement to work.
This works fine:
...and returns this:
*UPDATE* Fixed it! The error is here:
I changed it to this:
Also, for performance reasons I replaced all references to strstr with strncmp instead:
Code: Select all
pxEreplace( "TEST AA>BB", "AA", "BB", 0, 1 )
This works fine:
Code: Select all
pxEreplace( "TEST P>Q PP>QQ", "P", "Q", 0, 1 )
Code: Select all
TEST Q>Q QQ>QQ
Code: Select all
if (strstr(&str[i], subStr) == &str[i])
{
count++;
i += oldlen - 1;
if (count == beg)
{break;}
}
Code: Select all
if (strstr(&str[i], subStr) == &str[i])
{
count++;
if (count == beg)
{break;}
i += oldlen - 1;
}
Code: Select all
if (strncmp(&str[i], subStr, oldlen) == 0)
Last edited by PhilHibbs on Mon Nov 21, 2011 11:08 am, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
In fact, here is my complete version including malloc fix:
(code removed, see later post for source that fixes another issue)
(code removed, see later post for source that fixes another issue)
Last edited by PhilHibbs on Wed Nov 23, 2011 5:57 am, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
This routine is now causing my job to abort with this:
APT_CombinedOperatorController(5),1: Operator terminated abnormally: received signal SIGSEGV
APT_CombinedOperatorController(5),0: Operator terminated abnormally: received signal SIGSEGV
Any ideas? I have disabled the main body of the routine in order to rule out a programming error:
*Update* It works fine in a test job that has a row generator that feeds 1000000 rows into a Transformer, that does 4 different replaces including a nested call that replaces two strings:
APT_CombinedOperatorController(5),1: Operator terminated abnormally: received signal SIGSEGV
APT_CombinedOperatorController(5),0: Operator terminated abnormally: received signal SIGSEGV
Any ideas? I have disabled the main body of the routine in order to rule out a programming error:
Code: Select all
#include "string.h"
#include "stdlib.h"
char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
int buflen = strlen(str)+1;
char *result = (char *)malloc( buflen );
int newlen = strlen(rep);
int oldlen = strlen(subStr);
int i, x, count = 0;
if (result==0) {return 0;}
strcpy(result,str);
return result;
...
Code: Select all
pxEreplace( pxEreplace( "AA>BB", ">", "=", 0, 1 ), "AA", "BB", 0, 0 )
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
I think I have found it - it's if the str parameter is an empty string. Not sure why this is, as the function should just sail through and do nothing...PhilHibbs wrote:This routine is now causing my job to abort with this:
APT_CombinedOperatorController(5),1: Operator terminated abnormally: received signal SIGSEGV
APT_CombinedOperatorController(5),0: Operator terminated abnormally: received signal SIGSEGV
I have worked around by adding this to the start:
Code: Select all
if (buflen==1) {result[0]='\0'; return result;}
Last edited by PhilHibbs on Fri Nov 25, 2011 12:32 pm, edited 1 time in total.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
DataStage passes a null pointer rather than an empty string. This should fix the problems that that causes:
Code: Select all
/******************************************************************************
* pxEreplace - DataStage parallel routine
*
* Published on DSXchange.com by user DSguru2B
* http://www.dsxchange.com/viewtopic.php?t=106358
*
* Bugs (malloc, realloc, count) fixed by Philip Hibbs, Capgemini
*
* INSTRUCTIONS
*
* 1. Copy the source file pxEreplace.cpp into a directory on the server
* 2. Run the following command:
*
* g++ -O -fPIC -Wno-deprecated -c pxEreplace.cpp
*
* (check Administrator->Properties->Environment->Parallel->Compiler settings)
*
* 3. Copy the output into the DataStage library directory:
*
* cp pxEreplace.o `cat /.dshome`/../PXEngine/lib/pxEreplace.o
*
* 4. Create the Server Routine with the following properties:
*
* Routine Name : pxEreplace
* External subroutine name : pxEreplace
* Type : External function
* Object type : Object
* Return type : char*
* Library path : /software/opt/IBM/InformationServer/Server/PXEngine/lib/pxEreplace.o
* Arguments:
* str I char*
* subStr I char*
* rep I char*
* num I int
* beg I int
*
* Save & Close
*
* Any time that anything changes, you must recompile all jobs that use the routine.
*
******************************************************************************/
#include "string.h"
#include "stdlib.h"
char* pxEreplace(char *str, char *subStr, char *rep, int num, int beg)
{
char empty[1]="";
if (!str) {str = empty;}
if (!subStr) {subStr = empty;}
if (!rep) {rep = empty;}
int buflen = strlen(str)+1;
char *result = (char *)malloc( buflen );
if (!result) {return 0;}
if (buflen==1) {result[0]='\0'; return result;}
int oldlen = strlen(subStr);
int newlen = strlen(rep);
int i, x, count = 0;
if (oldlen==0)
{ // special case - insert rep once at the start of the string and return
if (newlen>0)
{
buflen = buflen + newlen;
result = (char *)realloc( result, buflen );
}
strcpy(result, rep);
strcpy(result+newlen, str);
return result;
}
//If begining is less than or equal to 1 then default it to 1
if (beg <= 1)
{beg = 1;}
//replace all instances if value of num less than or equal to 0
if (num <= 0)
{num = buflen;}
//Get the character position in i for substring instance to start from
for (i = 0; str[i] != '\0' ; i++)
{
if (strncmp(&str[i], subStr, oldlen) == 0)
{
count++;
if (count == beg) { break; }
i += oldlen - 1;
}
}
//Get everything before position i before replacement begins
x = 0;
while (i != x)
{ result[x++] = *str++; }
//Start replacement
while (*str) //for the complete input string
{
if (num != 0 ) // untill no more occurances need to be changed
{
if (strncmp(str, subStr, oldlen) == 0)
{
if (newlen > oldlen)
{
buflen = buflen + (newlen - oldlen);
result = (char *)realloc( result, buflen );
}
strcpy(&result[x], rep);
x += newlen;
str += oldlen;
num--;
}
else // if no match is found
{
result[x++] = *str++;
}
}
else
{
result[x++] = *str++;
}
}
result[x] = '\0'; //Terminate the string
return result; //Return the replaced string
}
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
Don't take this the wrong way but if you're not skilled in all of the ways of C++ then this isn't a path for you. IMHO, you'd be better served by starting a new post and letting us know what kind of a 'string problem' you are having. Then we can suggest alternatives.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers