Page 1 of 3

String Parsing

Posted: Mon May 02, 2005 2:28 pm
by varshanswamy
hi,

I have an input string of the format
Field1,Field2
--------------------------
1,A=B&C=D&E=F&G=H
2,L=M&G=H

I need to write a job in datastage
to split this string which would be dynamic
and have it of the form.

Field1, Field2
---------------------
1,A=B
1,C=D
1,E=F
1,G=H
2,L=M
2,G=H


regards,
varsha

Posted: Tue May 03, 2005 1:06 am
by ArndW
Varsha,

Use a transform stage with variables:

KeyNumber FIELD(In.String,",",1)
RestOfString EREPLACE(In.String[3,9999],"&",CHAR(13):CHAR(15):KeyNumber:",")

The RestOfString variable now is formatted as you wanted, if you write to a sequential file you will have the format with <cr><lf> separation.

Posted: Tue May 03, 2005 1:19 am
by ArndW
Varsha,

I just saw that this was posted in the Parallel Extender forum, the solution I posted is for a Server job - doing the same logic in Px is somewhat more complicated, mainly because the EREPLACE function doesn't exist.

With a Px job I would use a pivot stage - but I am not on a system to check that right now.

Posted: Tue May 03, 2005 11:12 pm
by sudharsanan
Hi Arndw,

I tried the option that you gave .. i tried it in Datastage 6.0 Server Job...

Used a transform stage with variables:

KeyNumber FIELD(In.String,",",1)
RestOfString EREPLACE(In.String[3,9999],"&",CHAR(13):CHAR(15):KeyNumber:",")

But the result that i got was only 2 rows...

Field1 Field2
1 B
2 M

Not the excepted one.. can you tell me how to solve.. My job design is like this..

I/P(SequentialFile) ---> Xfm ---> O/P(Sequential File)

should i need to use a Pivot stage to convert the row to column...

Please advise me on how to approach..

Thanks
Sudharsanan

String Parsing

Posted: Wed May 04, 2005 12:04 am
by Sunshine2323
Hi Sudharshan,

I have tried Arnd's method in a sample job and it has worked for me.

Have u created the stage variables?

You need to use the value of the KeyNumber stage variable in the RestOfString stage variable

Also after you run the job open the sequential file on the server to see the Output as in my case the the Sequential file stage was not able to read the complete output data as 1 column from the input was converted into 2 columns :)

Posted: Wed May 04, 2005 12:17 am
by sudharsanan
Hi,

I used two stage variables in my job as told by Arnd...

You are correct in the Sequential File Stage i could see only two values(Don't know why).. But when i saw the actual file the result was like this:

Field1,"Field2"

1,"B
1,C=D
1,E=F
1,G=H"

2,"M
2,G=H"

There spaces in between the records... the starting values for 1 and 2 are missing... That is reason for two records only in the Squential file stage.. should we have change the formatting of the file in the output...

What was the result you got?...was any mistake with my job design.. please advice..

Thanks
Sudharsanan

String Parsing

Posted: Wed May 04, 2005 12:32 am
by Sunshine2323
Hi Sudharshan, :)

Remove only char(15) from the derivation for the Stage Variable RestofString also replace the Quote Character to 000 in the Output Format Tab and try and let us know.

You will not be able to see the Output File as the Output file is defined for 1 column and the derivation converts it 2 columns.

If you try reading the output data using a sequential file that has 2 output columns defined you will have no problems.

Do you have any other suggestions?

Posted: Wed May 04, 2005 12:42 am
by ArndW
sudharsanan wrote:Hi Arndw,

I tried the option that you gave .. i tried it in Datastage 6.0 Server Job...

Used a transform stage with variables:

KeyNumber FIELD(In.String,",",1)
RestOfString EREPLACE(In.String[3,9999],"&",CHAR(13):CHAR(15):KeyNumber:",")
...

Thanks
Sudharsanan
I just saw that I used CHAR(13):CHAR(15); that is not correct :roll: All you need is a <lf> so just use a CHAR(10). Mea culpa.

Posted: Wed May 04, 2005 12:46 am
by sudharsanan
Hi,

Thanks Sunshine for your input ....



I tried the option you told... the result is
Result Got

Field1,"Field2"

Field1,Field2

1,B
1,C=D
1,E=F
1,G=H

2,M
2,G=H




Actual Result

Field1, Field2
---------------------
1,A=B
1,C=D
1,E=F
1,G=H
2,L=M
2,G=H

The First record and Fifth records the first value is missing, what i have to change to get these values.. sorry for disturbing... please advice..


Thanks
Sudharsanan

Posted: Wed May 04, 2005 1:01 am
by ArndW
Sudhar,

please look at the logic of the program to find the error. The transform does nothing but replace the ampersand in the string with the number and line terminators. If you were to replace the CHAR(nn) with a simple "x" your line structure would not be changed and you could see what is happening in your code.
It looks to me like you are having an issue with the quoting and column separators - please remove any quoting and separators from the output and also look at the file from outside DS as well as from the "view-data".

String Parsing

Posted: Wed May 04, 2005 1:01 am
by Sunshine2323
Hi Sudharshan,
Can you paste a copy your derivation for the stage variables.
As I have achieved the desired result with the derivation specified.

Posted: Wed May 04, 2005 1:16 am
by sudharsanan
Hi All,

Thanks for the inputs...but the problem still resides in my code..
Let me put the code..

Stage Variables:

StageVar=FIELD(DSLink3.Field1,",",1)

StageVar1=EREPLACE(DSLink3.Field2[3,9999],"&",CHAR(10):StageVar:",")

In the output I am just using the StageVariables

Field1 = StageVar
Field2 = StageVar1

The Result i got is

Field1,"Field2"

1,"B
1,C=D
1,E=F
1,G=H"

2,"M
2,G=H"


I tried to look at the Data from the VIEW Data option where i am getting only
Field1 Field2
1 B
2 M

I couldn't find the Reason for that...Sorry for disturbing.. please help me..

Thanks
Sudharsanan

Posted: Wed May 04, 2005 1:19 am
by ArndW
Please replace the CHAR(10) with a "*", remove the delimiter and quotes from the output file and post your results.

String Parsing

Posted: Wed May 04, 2005 1:26 am
by Sunshine2323
Sudharshan,
Stage Variables:

StageVar=FIELD(DSLink3.Field1,",",1)

StageVar1=EREPLACE(DSLink3.Field2[3,9999],"&",CHAR(10):StageVar:",")
You have 2 fields in the input

Try reading both the fields in a single column by specifying a delimiter like '|' that is not present in your input data then cahnge the derivation as follows

StageVar=FIELD(DSLink3.Field1,",",1)

StageVar1=EREPLACE(DSLink3.Field1,"&",CHAR(10):StageVar:",")

Then let us know your result

Posted: Wed May 04, 2005 1:27 am
by sudharsanan
Hi Arndw,

Which value i should give for the delimiter and quotes ?..As i cannot leave that field blank...

Thanks
Sudharsanan