Usage of TrimLeadingTrailing

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
arpitchopra
Participant
Posts: 35
Joined: Mon Nov 01, 2010 2:48 am
Location: New Delhi

Usage of TrimLeadingTrailing

Post by arpitchopra »

Hi Experts !

I have a parallel job that reads from a flat file and loads into a Netezza target. It has the following layout:

File --> Transformer --> Netezza Target

The transformer has a stage variable which checks whether a given column is Numeric or not. The derivation for this stage variable is as follows:

If TrimLeadingTrailing(Lk1.Quantity)='' or IsValid('decimal[18,3]',TrimLeadingTrailing(Lk1.Quantity))=1 Then '' Else 'Quantity is not a valid decimal'

I tried the following test cases:
Quantity: abc
Var: Quantity is not a valid decimal

Quantity: 123abc
Var: Quantity is not a valid decimal

Quantity: 0123 abc
Var: ''


This last test case is something that is troubling me. Even though the field is not a Numeric, it still passes the check. I have a feeling that this is because of the Trim function (which sees the space and discards everything after that space). Please advise.
Arpit Chopra
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Re: Usage of TrimLeadingTrailing

Post by ShaneMuir »

arpitchopra wrote:I have a feeling that this is because of the Trim function (which sees the space and discards everything after that space).
Did you try outputting the return value of the trim function to see what was actually output?
arpitchopra
Participant
Posts: 35
Joined: Mon Nov 01, 2010 2:48 am
Location: New Delhi

Re: Usage of TrimLeadingTrailing

Post by arpitchopra »

I did. And surprisingly, the output for the 3rd case is 123.000 !
Is this how TrimLeadingTrailing usually works ? Isn't it just supposed to remove the white spaces from the beginning and the end of the entire string ?
ShaneMuir wrote:Did you try outputting the return value of the trim function to see what was actually output?
Arpit Chopra
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Its actually the IsValid test that is not working. If you were to pass your input un trimmed into the IsValid test you would probably find that it returns TRUE.

TrimleadingTrailing is used to remove space from the beginning and end of fields and leave multiple spaces in the middle of a string if they are there, unlike Trim which would remove the multiple spaces and leave just the one.

I wouldn't be surprised if the IsValid test only looks until the space and drops the rest of the line item.

As to why that would be the case - I do not know.
arpitchopra
Participant
Posts: 35
Joined: Mon Nov 01, 2010 2:48 am
Location: New Delhi

Post by arpitchopra »

ShaneMuir wrote:Its actually the IsValid test that is not working. If you were to pass your input un trimmed into the IsValid test you would probably find that it returns TRUE.
I don't want to do that as my file may contain those junk spaces.
Consider a scenario where the input is " 12" (or "<space><space><space>12"). I want this to pass the test and be stored as 12.000
Won't IsValid without a Trim fail this case ?
ShaneMuir wrote:I wouldn't be surprised if the IsValid test only looks until the space and drops the rest of the line item.
As to why that would be the case - I do not know.
You are right. I have applied TrimLeadingTrailing in the derivation for a few Varchar columns and it behaves as expected. It doesn't discard everything after the 1st space.




Do you have an alternative to test for this then ?
Arpit Chopra
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

I would suggest using the Num() or Alpha() functions to find those values which contain only numbers or alpha characters.
arpitchopra
Participant
Posts: 35
Joined: Mon Nov 01, 2010 2:48 am
Location: New Delhi

Post by arpitchopra »

ShaneMuir wrote:I would suggest using the Num() or Alpha() functions to find those values which contain only numbers or alpha characters.
But this won't help me check for the precision. I need to discard everything that has precision greater than 3.

I mean we can ideally write a code that will check for the number of characters after "." but then it will be a huge effort to explain the multiple substrings and indices in the formula to someone else. I was hoping for something more subtle and simple. Like the IsValid. If only that would work. :(
Arpit Chopra
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Well you could try the NUM() function in conjunction with IsValid(). I am pretty sure NUM() will return true for 0123 and 0123.01 or 123.456, but false for 0123 abc.

Also I am pretty sure that the isvalid test won't actually test whether the input string has the relevant precision rather that it can be converted to the relevant precision. So 0123 and 0123.01 and 123.456 would all return true for the IsValid(Decimal) check.

If you want to check whether something has only 3 decimals from a string text I would use something like len(trim(input.string['.',2,1]))=3 or similar.
arpitchopra
Participant
Posts: 35
Joined: Mon Nov 01, 2010 2:48 am
Location: New Delhi

Post by arpitchopra »

ShaneMuir wrote:Also I am pretty sure that the isvalid test won't actually test whether the input string has the relevant precision rather that it can be converted to the relevant precision. So 0123 and 0123.01 and 123.456 would all return true for the IsValid(Decimal) check
You are right. This was a mistake on my part.

However, when I write the following code as a derivation to the same Quantity column, I get 1234.000 as the output.

Code: Select all

StringToDecimal(TrimLeadingTrailing(Lk1.Quantityt),"round_inf")
I don't think it is an issue with IsValid.
Arpit Chopra
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

I suggested that IsValid had an issue, mainly because of your original post which suggested that it was the TrimLeadingTrailing giving the incorrect result, when actually your incorrect result was because the IsValid determined that the string '0123 abc' was a valid decimal and thus created an output you were not expecting.

Also, If you where to pass the string '0123 abc' to a column that is set as decimal with a specific precision, with no functions, conversions etc, i am guessing that it would convert the first part and drop the second part. Ie your output would be eg 123.000. As to my understanding this is because the IsValid test effectively tries to do a StringtoDecimal conversion, if it can, then its valid, if not then invalid. Passing the string to a decimal output would have the same result but you would get the errors written to the log.
arpitchopra
Participant
Posts: 35
Joined: Mon Nov 01, 2010 2:48 am
Location: New Delhi

Post by arpitchopra »

It is actually an issue with 2 things.
1. The first is that StringToDecimal() will automatically convert any string to a decimal (regardless of the format specified) if it CAN be converted. So, "1234 abc" will be converted to 1234
2. The second issue (and a bigger one) is with IsValid(). This function will also work in the same way. It will return true if the string CAN be converted to a decimal. It expects you to handle that conversion in the derivation of the target column.

Please read the following link for further information:
http://www-01.ibm.com/support/docview.w ... wg1JR39651
Arpit Chopra
Post Reply