Build Op for "LIKE" operation between data in two
Posted: Fri Dec 14, 2012 1:08 am
Hi Guys
I am having a requirement to do "Like" Operation between data in two files.
Since data is large i wrote an Build Op to do the operation.Which will work as Join stage
INPUT DATA File Format
InputCode| Input_Record
REF DATA File Format
RefCode|Ref_Record
Matching Condition
If InputCode=RefCode then if RefRecord like %InputRecord% then give InputRecord as output
Here Input record can be a maximum of 2 Million , while ref can be of any count .So i am using an array with the size of 2 million to store the input record and passing data form Ref to the build op for matching.
Am attaching the code i used
If ref count is about 1 or 2 million am getting good performance. But when it goes beyond that performance is decreasing drastically also its taking most of the CPU..
Can you guys please suggest me any better way in doing this!!
I am having a requirement to do "Like" Operation between data in two files.
Since data is large i wrote an Build Op to do the operation.Which will work as Join stage
INPUT DATA File Format
InputCode| Input_Record
REF DATA File Format
RefCode|Ref_Record
Matching Condition
If InputCode=RefCode then if RefRecord like %InputRecord% then give InputRecord as output
Here Input record can be a maximum of 2 Million , while ref can be of any count .So i am using an array with the size of 2 million to store the input record and passing data form Ref to the build op for matching.
Am attaching the code i used
Code: Select all
[b]Definitions:[/b]
int i=0;
string Inpcode[2000000];
string InpRecord[2000000];
string RefRecord[1];
int j;
int k=0;
int comp=0;
int z;
int partunm=0;
[b]Pre-loop:[/b]
readRecord(0);
while (inputDone(0)!=1)
{
doTransfersFrom(0);
Inpcode[i]=InRec.Inputcode;
InpRecord[i]=InRec.InputRecord;
i++;
readRecord(0);
}
[b]
Per-Record:[/b]
readRecord(1);
while (inputDone(1)!=1)
{
doTransfersFrom(1);
RefRecord[0]=(string) InRecRf.RefRecord;
for (j=0;j<=i;j++)
{
if ( (string) InRecRf.Inputcode == Inpcode[j] )
{
if ( RefRecord[0].find(InpRecord[j]) != string::npos )
{
char *ipInputRecord = (char*)InpRecord[j].c_str();
OutRec.InpMatchRecord=ipInputRecord ;
comp=1;
transferAndWriteRecord(0);
}
}
}
comp=0;
readRecord(1);
}
If ref count is about 1 or 2 million am getting good performance. But when it goes beyond that performance is decreasing drastically also its taking most of the CPU..
Can you guys please suggest me any better way in doing this!!