c routine for substrings

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

c routine for substrings

Post by jasper »

Hi,

As an input I have strings like :
"Modem"=X:"49255080","Pkg"=l:0,"Svc"=l:4,"Brch"=L:0,"Reason"=l:0,"CfgDuration"=L:1800,"Duration"=L:1800,"EndTime"=T:1132056185,"UpV"=L:8,"DnV"=L:8,"Ses"=l:7,"Sec"=l:32,"StartTime"=T:1132056185;

So structured like:
key=type:value,key=type:value,... ;
Keys are not always all available and sequence can be different. I need to split this up into seperate fields. I first tried with column-import but because sequence can change, keys can be missing I don't see how I can get this to work.

So next option: I was writing a parallel-routine, but I'm not very good at C.

I have an Oracle-function that does this thing (but that would mean that I first load to a table and then do an extra step to another table, which I would like to avoid)
Oracle function is :

Code: Select all

CREATE OR REPLACE FUNCTION GET_AVL (p_avl VARCHAR2, p_param VARCHAR2) RETURN VARCHAR2 IS
   ret   VARCHAR2 (4000);
BEGIN
   IF INSTR (UPPER (p_avl), '"' || UPPER (p_param) || '"') = 0 THEN
      RETURN '';
   ELSE
      ret := SUBSTR (p_avl,INSTR (UPPER (p_avl), '"' || UPPER (p_param) || '"') + LENGTH (p_param) + 5);
      IF INSTR (ret, ',') != 0 THEN
         ret := SUBSTR (ret, 1, INSTR (ret, ',')-1);
      END IF;
      RETURN RTRIM (LTRIM (ret, '"'), '";');
   END IF;
END;
(ouput for the above string would be: GET_AVL(AVL,'MODEM')=49255080)

Main problem i have is finding a C-version for the INSTR, without writing a recursive function.

So actually 2 questions:
-can anyone help with a C-function that finds the position of string B in string A?
OR
-Do you see any other (high-performant) alternative for this function?

(I know this is a datastage forum not a C-forum, but I think the question fits here)
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I think that you really don't need to go to the effort of writing something in C, the DataStage equivalent of the Oracle INSTR function is called INDEX 8)
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

I'm not really convinced about your solution.

You are correct I can write this in Datastage, but It is a complex calculation which would be used 25 times in one transformer. I want this in a routine otherwise it won't be readable.

My transform has to look simple so The same funstion being called in all fields for another key
(so like getvalue(instring,'MODEM') for the modem field, getvalue(instring,'STARTTIME') for starttimefield.


I'm not sure how to do this without a C-function (and without going to a basic-transform, where I know the syntax).



Meanwhile I've progressed on my function I now have a correct c-function to pick the one valuepair for this key. Now I still need a substring function to cut the value out of this.

Code: Select all

char* GetKeyValue(char* fullstring,char* searchstring)
{char found[100];
char* tmp;
char* keyvaldelimiter;
size_t len;
int keylen;

keyvaldelimiter=",";
keylen=strlen(searchstring);
/*look for the place in full where search resides */
tmp=strstr(fullstring,searchstring);
/*find the first ',' after the searchstring */
len=strcspn(tmp,keyvaldelimiter);
/*copy the key-value pair to found */
strncpy(found,tmp,len);

found[len]='\0';


return found;
}
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Jasper,

once you solve your problem, compare performance with the simple INDEX() function and perhaps you might be convinced that the solution is more efficient. Please remember to post your results here, I for one am very curious.
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

ArndW,
Do I oversee anything in the documentation of the index-function?

using index I think it would look like:
index(Fullstring,searchstring,1) would point me to the beginning of the key.
Then I need to find the end of the key so +len(key).then I have the end of the key, value starts 5 places further : +5

then I know where the value starts, but how long is the value?, so find the first "-sign and take everything before that.

Ok, this can be done, but if this calculation is there 25 times in one transformer, I don't want to be the changing this when the sourcefile changes.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Jasper,

the function INDEX is exactly identical to INSTR, the first parameter is the string to check, the second is the substring to search for. The optional 3rd parameter is the occurence number. It returns the position in the string where the substring is found. It functions the same way in Server and in PX.

You asked for a way to do INSTR() and I answered.

I would solve your problem a bit differently. I would start by declaring the source as have 25 columns separated by ":", then for each colum I would use FIELD(In.Column01,'=',1) for the name and FIELD(In.Column01,'=',2) for the value.

Name = LEFT(
jasper
Participant
Posts: 111
Joined: Mon May 06, 2002 1:25 am
Location: Belgium

Post by jasper »

I succeeded in this c-function.
have to check special cases but this is the code for a first working version

Code: Select all

char* GetKeyValue(char* fullstring,char* searchstring)
{char found[100];
char returnval[100];
char* tmp;
char* keyvaldelimiter;
char* puntcomma;
size_t len;
int keylen;

keyvaldelimiter=",";
keylen=strlen(searchstring);
/*look for the place in full where search resides */
tmp=strstr(fullstring,searchstring);
/*find the first ',' after the searchstring */
len=strcspn(tmp,keyvaldelimiter);
/*copy the key-value pair to found */
strncpy(found,tmp,len);

found[len]='\0';
/*find the value */
strncpy(returnval,found+keylen+3,len-keylen);
returnval[len-keylen-3]='\0';
 if ((puntcomma = strchr(returnval, ';')) != NULL)
      *puntcomma = '\0';


return returnval;
}
Post Reply