CONVERT Function not performing as expected...

sjordery · Post by **sjordery** » Wed Nov 26, 2008 6:56 am

Hi All,

I have a simple test job set up with:

SF ===> TFM ===> SF

The Sequential Files have a single column (BigCol) of VarChar(25).

The input file is:

Line One
Line Two
Line Three

And the Hex representation is:

In the TransForMer I have:

Convert(CHAR(10),"",DSLink3.BigCol)

When I run the job, I was expecting that the 0A line feed characters would be replaced with blanks. Alas, they are still there in the output.

Any assistance much appreciated.

Thanks.
S

ArndW · Post by **ArndW** » Wed Nov 26, 2008 8:43 am

The 0x0a characters are the line separators used by your files, so they wouldn't show up in the data.

sjordery · Post by **sjordery** » Wed Nov 26, 2008 9:02 am

Thanks.

Is there an easy way of removing these?

The actual problem I have is that I am trying to parse an XML file. One of the elements contains lots of text that is formatted on the file as:

Code: Select all

<textElement>
Lots of text starts here and goes on and on.

Then a new bit starts here, causing problems.  The lines have
0A control characters for line feed.
</textElement>

When I run this through the XMLInput stage, it parses ok, but the output (sequential) file shows the rows:

Code: Select all

Row1:Lots of text starts here and goes on and on.
Row2:
Row3:Then a new bit starts here, causing problems.  The lines have
Row4:0A control characters for line feed.

What I want is:

Code: Select all

Row1: Lots of text starts here and goes on and on. Then a new bit starts here, causing problems.  The lines have 0A control characters for line feed.

So... my idea was to use convert to remove the control characters from the input XML...

Thanks again.

Edit to add:

When I run the convert on the small test file, but use CHAR(13) rather than CHAR(10), it does remove the 0D CR characters..

ArndW · Post by **ArndW** » Wed Nov 26, 2008 9:21 am

Ok, that can be done. Keep your input file definitions as they are. If I recall correctly, 0x0a is "whitespace" in XML, so you could add a transform stage:

Code: Select all

SvarNewLine = IF INDEX(In.BigColumn,'>',1) > 0 THEN 1 ELSE 0
SvarOutString = IF SvarNewLine=1 THEN SvarDataString:In.BigColumn ELSE ''
SvarDataString = IF SvarNewLine=1 THEN '' ELSE SvarDataString:In.BigColumn

Constraint: SvarNewLine=1

this would only output a new data line when a ">" character is found, which usually would mean the end of an end tag. This simple logic would remove most carriage returns.

Basically you have a problem with VarChar() lines. If you were to declare a fixed line length of 100 and no terminators, the job would read 100 characters including the 0x0a and you could use your convert to remove them. The last line in the file would cause a warning, since it will probably not be 100 characters exactly. The question arises as to whether you need any 0x0a characters at all, if you declare your data string big enough, you could read the whole XML file as one string and then proceed to replace the linefeeds.

sjordery · Post by **sjordery** » Wed Nov 26, 2008 9:42 am

Thanks very much for that - it's certainly given me a different direction to think in.

Much appreciated.

S

sjordery · Post by **sjordery** » Wed Nov 26, 2008 10:10 am

ArndW wrote: this would only output a new data line when a ">" character is found, which usually would mean the end of an end tag...

or the end of an open tag

I'll fiddle with the logic and get it though :D

ray.wurlod · Post by **ray.wurlod** » Wed Nov 26, 2008 1:44 pm

Change the format to "no line terminators" then your Convert() function will work. Of course you will then only have one line of data.

sjordery · Post by **sjordery** » Thu Nov 27, 2008 5:00 am

ray.wurlod wrote:Change the format to "no line terminators" then your Convert() function will work. Of course you will then only have one line of data.

Thanks Ray.

I have changed the Final Delimiter on the input file to 'none', but this made no difference. Is this what you meant by "no line terminators", or have I misunderstood?

Cheers,
S

ArndW · Post by **ArndW** » Thu Nov 27, 2008 5:13 am

If you do as Ray suggests you will need to increase BigCol to hold more than 25 charactes.
The "Final Delimiter" is not the same as the "Line delimiter"

sjordery · Post by **sjordery** » Thu Nov 27, 2008 7:43 am

ArndW wrote:If you do as Ray suggests you will need to increase BigCol to hold more than 25 charactes.
The "Final Delimiter" is not the same as the "Line delimiter"

I tried this by setting Record Delimiter to 'null' - this produced one row of output in DataStage (appears as one row, seperated with square box (representation of the control chars)), but when I view the file in a HEX editor, it still has the 0D0A characters.

I am confused though as to why using CHAR(13) it clears the CR, but using CHAR(10) missed the LF. Any ideas?

Thanks again.

ArndW · Post by **ArndW** » Thu Nov 27, 2008 8:34 am

Just delete the "Record Delimiter" completely. You are seeing the difference between a UNIX and DOS line termination, <CR>LF> versus <LF>.

sjordery · Post by **sjordery** » Thu Nov 27, 2008 9:09 am

I could only get it to work with Record Delimiter=null - the reason this failed before was that I didn't specify that setting on the output file, only on the input

Still, working as expected now (albeit with the 0D's still there), so will mark as resolved.

Thanks everyone for your help.

Regards,
S