Page 1 of 1

CONVERT Function not performing as expected...

Posted: Wed Nov 26, 2008 6:56 am
by sjordery
Hi All,

I have a simple test job set up with:

SF ===> TFM ===> SF

The Sequential Files have a single column (BigCol) of VarChar(25).

The input file is:

Line One
Line Two
Line Three

And the Hex representation is:

Image

In the TransForMer I have:

Convert(CHAR(10),"",DSLink3.BigCol)

When I run the job, I was expecting that the 0A line feed characters would be replaced with blanks. Alas, they are still there in the output.

Any assistance much appreciated.

Thanks.
S

Posted: Wed Nov 26, 2008 8:43 am
by ArndW
The 0x0a characters are the line separators used by your files, so they wouldn't show up in the data.

Posted: Wed Nov 26, 2008 9:02 am
by sjordery
Thanks.

Is there an easy way of removing these?

The actual problem I have is that I am trying to parse an XML file. One of the elements contains lots of text that is formatted on the file as:

Code: Select all

<textElement>
Lots of text starts here and goes on and on.

Then a new bit starts here, causing problems.  The lines have
0A control characters for line feed.
</textElement>
When I run this through the XMLInput stage, it parses ok, but the output (sequential) file shows the rows:

Code: Select all

Row1:Lots of text starts here and goes on and on.
Row2:
Row3:Then a new bit starts here, causing problems.  The lines have
Row4:0A control characters for line feed.
What I want is:

Code: Select all

Row1: Lots of text starts here and goes on and on. Then a new bit starts here, causing problems.  The lines have 0A control characters for line feed.
So... my idea was to use convert to remove the control characters from the input XML...

Thanks again.

Edit to add:

When I run the convert on the small test file, but use CHAR(13) rather than CHAR(10), it does remove the 0D CR characters..

Posted: Wed Nov 26, 2008 9:21 am
by ArndW
Ok, that can be done. Keep your input file definitions as they are. If I recall correctly, 0x0a is "whitespace" in XML, so you could add a transform stage:

Code: Select all

SvarNewLine = IF INDEX(In.BigColumn,'>',1) > 0 THEN 1 ELSE 0
SvarOutString = IF SvarNewLine=1 THEN SvarDataString:In.BigColumn ELSE ''
SvarDataString = IF SvarNewLine=1 THEN '' ELSE SvarDataString:In.BigColumn

Constraint: SvarNewLine=1
this would only output a new data line when a ">" character is found, which usually would mean the end of an end tag. This simple logic would remove most carriage returns.

Basically you have a problem with VarChar() lines. If you were to declare a fixed line length of 100 and no terminators, the job would read 100 characters including the 0x0a and you could use your convert to remove them. The last line in the file would cause a warning, since it will probably not be 100 characters exactly. The question arises as to whether you need any 0x0a characters at all, if you declare your data string big enough, you could read the whole XML file as one string and then proceed to replace the linefeeds.

Posted: Wed Nov 26, 2008 9:42 am
by sjordery
Thanks very much for that - it's certainly given me a different direction to think in.

Much appreciated.

S

Posted: Wed Nov 26, 2008 10:10 am
by sjordery
ArndW wrote: this would only output a new data line when a ">" character is found, which usually would mean the end of an end tag...
or the end of an open tag :wink:

I'll fiddle with the logic and get it though :D

Posted: Wed Nov 26, 2008 1:44 pm
by ray.wurlod
Change the format to "no line terminators" then your Convert() function will work. Of course you will then only have one line of data.

Posted: Thu Nov 27, 2008 5:00 am
by sjordery
ray.wurlod wrote:Change the format to "no line terminators" then your Convert() function will work. Of course you will then only have one line of data.
Thanks Ray.

I have changed the Final Delimiter on the input file to 'none', but this made no difference. Is this what you meant by "no line terminators", or have I misunderstood?

Cheers,
S

Posted: Thu Nov 27, 2008 5:13 am
by ArndW
If you do as Ray suggests you will need to increase BigCol to hold more than 25 charactes.
The "Final Delimiter" is not the same as the "Line delimiter"

Posted: Thu Nov 27, 2008 7:43 am
by sjordery
ArndW wrote:If you do as Ray suggests you will need to increase BigCol to hold more than 25 charactes.
The "Final Delimiter" is not the same as the "Line delimiter"
I tried this by setting Record Delimiter to 'null' - this produced one row of output in DataStage (appears as one row, seperated with square box (representation of the control chars)), but when I view the file in a HEX editor, it still has the 0D0A characters. :(

I am confused though as to why using CHAR(13) it clears the CR, but using CHAR(10) missed the LF. Any ideas?

Thanks again.

Posted: Thu Nov 27, 2008 8:34 am
by ArndW
Just delete the "Record Delimiter" completely. You are seeing the difference between a UNIX and DOS line termination, <CR>LF> versus <LF>.

Posted: Thu Nov 27, 2008 9:09 am
by sjordery
I could only get it to work with Record Delimiter=null - the reason this failed before was that I didn't specify that setting on the output file, only on the input :oops:

Still, working as expected now (albeit with the 0D's still there), so will mark as resolved.

Thanks everyone for your help.

Regards,
S