CONVERT Function not performing as expected...

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sjordery
Premium Member
Premium Member
Posts: 202
Joined: Thu Jun 08, 2006 5:58 am

CONVERT Function not performing as expected...

Post by sjordery »

Hi All,

I have a simple test job set up with:

SF ===> TFM ===> SF

The Sequential Files have a single column (BigCol) of VarChar(25).

The input file is:

Line One
Line Two
Line Three

And the Hex representation is:

Image

In the TransForMer I have:

Convert(CHAR(10),"",DSLink3.BigCol)

When I run the job, I was expecting that the 0A line feed characters would be replaced with blanks. Alas, they are still there in the output.

Any assistance much appreciated.

Thanks.
S
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The 0x0a characters are the line separators used by your files, so they wouldn't show up in the data.
sjordery
Premium Member
Premium Member
Posts: 202
Joined: Thu Jun 08, 2006 5:58 am

Post by sjordery »

Thanks.

Is there an easy way of removing these?

The actual problem I have is that I am trying to parse an XML file. One of the elements contains lots of text that is formatted on the file as:

Code: Select all

<textElement>
Lots of text starts here and goes on and on.

Then a new bit starts here, causing problems.  The lines have
0A control characters for line feed.
</textElement>
When I run this through the XMLInput stage, it parses ok, but the output (sequential) file shows the rows:

Code: Select all

Row1:Lots of text starts here and goes on and on.
Row2:
Row3:Then a new bit starts here, causing problems.  The lines have
Row4:0A control characters for line feed.
What I want is:

Code: Select all

Row1: Lots of text starts here and goes on and on. Then a new bit starts here, causing problems.  The lines have 0A control characters for line feed.
So... my idea was to use convert to remove the control characters from the input XML...

Thanks again.

Edit to add:

When I run the convert on the small test file, but use CHAR(13) rather than CHAR(10), it does remove the 0D CR characters..
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Ok, that can be done. Keep your input file definitions as they are. If I recall correctly, 0x0a is "whitespace" in XML, so you could add a transform stage:

Code: Select all

SvarNewLine = IF INDEX(In.BigColumn,'>',1) > 0 THEN 1 ELSE 0
SvarOutString = IF SvarNewLine=1 THEN SvarDataString:In.BigColumn ELSE ''
SvarDataString = IF SvarNewLine=1 THEN '' ELSE SvarDataString:In.BigColumn

Constraint: SvarNewLine=1
this would only output a new data line when a ">" character is found, which usually would mean the end of an end tag. This simple logic would remove most carriage returns.

Basically you have a problem with VarChar() lines. If you were to declare a fixed line length of 100 and no terminators, the job would read 100 characters including the 0x0a and you could use your convert to remove them. The last line in the file would cause a warning, since it will probably not be 100 characters exactly. The question arises as to whether you need any 0x0a characters at all, if you declare your data string big enough, you could read the whole XML file as one string and then proceed to replace the linefeeds.
sjordery
Premium Member
Premium Member
Posts: 202
Joined: Thu Jun 08, 2006 5:58 am

Post by sjordery »

Thanks very much for that - it's certainly given me a different direction to think in.

Much appreciated.

S
sjordery
Premium Member
Premium Member
Posts: 202
Joined: Thu Jun 08, 2006 5:58 am

Post by sjordery »

ArndW wrote: this would only output a new data line when a ">" character is found, which usually would mean the end of an end tag...
or the end of an open tag :wink:

I'll fiddle with the logic and get it though :D
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Change the format to "no line terminators" then your Convert() function will work. Of course you will then only have one line of data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sjordery
Premium Member
Premium Member
Posts: 202
Joined: Thu Jun 08, 2006 5:58 am

Post by sjordery »

ray.wurlod wrote:Change the format to "no line terminators" then your Convert() function will work. Of course you will then only have one line of data.
Thanks Ray.

I have changed the Final Delimiter on the input file to 'none', but this made no difference. Is this what you meant by "no line terminators", or have I misunderstood?

Cheers,
S
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you do as Ray suggests you will need to increase BigCol to hold more than 25 charactes.
The "Final Delimiter" is not the same as the "Line delimiter"
sjordery
Premium Member
Premium Member
Posts: 202
Joined: Thu Jun 08, 2006 5:58 am

Post by sjordery »

ArndW wrote:If you do as Ray suggests you will need to increase BigCol to hold more than 25 charactes.
The "Final Delimiter" is not the same as the "Line delimiter"
I tried this by setting Record Delimiter to 'null' - this produced one row of output in DataStage (appears as one row, seperated with square box (representation of the control chars)), but when I view the file in a HEX editor, it still has the 0D0A characters. :(

I am confused though as to why using CHAR(13) it clears the CR, but using CHAR(10) missed the LF. Any ideas?

Thanks again.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Just delete the "Record Delimiter" completely. You are seeing the difference between a UNIX and DOS line termination, <CR>LF> versus <LF>.
sjordery
Premium Member
Premium Member
Posts: 202
Joined: Thu Jun 08, 2006 5:58 am

Post by sjordery »

I could only get it to work with Record Delimiter=null - the reason this failed before was that I didn't specify that setting on the output file, only on the input :oops:

Still, working as expected now (albeit with the 0D's still there), so will mark as resolved.

Thanks everyone for your help.

Regards,
S
Post Reply