Page 1 of 1

Usage of TrimLeadingTrailing

Posted: Fri Aug 31, 2012 5:26 am
by arpitchopra
Hi Experts !

I have a parallel job that reads from a flat file and loads into a Netezza target. It has the following layout:

File --> Transformer --> Netezza Target

The transformer has a stage variable which checks whether a given column is Numeric or not. The derivation for this stage variable is as follows:

If TrimLeadingTrailing(Lk1.Quantity)='' or IsValid('decimal[18,3]',TrimLeadingTrailing(Lk1.Quantity))=1 Then '' Else 'Quantity is not a valid decimal'

I tried the following test cases:
Quantity: abc
Var: Quantity is not a valid decimal

Quantity: 123abc
Var: Quantity is not a valid decimal

Quantity: 0123 abc
Var: ''


This last test case is something that is troubling me. Even though the field is not a Numeric, it still passes the check. I have a feeling that this is because of the Trim function (which sees the space and discards everything after that space). Please advise.

Re: Usage of TrimLeadingTrailing

Posted: Fri Aug 31, 2012 5:47 am
by ShaneMuir
arpitchopra wrote:I have a feeling that this is because of the Trim function (which sees the space and discards everything after that space).
Did you try outputting the return value of the trim function to see what was actually output?

Re: Usage of TrimLeadingTrailing

Posted: Fri Aug 31, 2012 6:07 am
by arpitchopra
I did. And surprisingly, the output for the 3rd case is 123.000 !
Is this how TrimLeadingTrailing usually works ? Isn't it just supposed to remove the white spaces from the beginning and the end of the entire string ?
ShaneMuir wrote:Did you try outputting the return value of the trim function to see what was actually output?

Posted: Fri Aug 31, 2012 6:20 am
by ShaneMuir
Its actually the IsValid test that is not working. If you were to pass your input un trimmed into the IsValid test you would probably find that it returns TRUE.

TrimleadingTrailing is used to remove space from the beginning and end of fields and leave multiple spaces in the middle of a string if they are there, unlike Trim which would remove the multiple spaces and leave just the one.

I wouldn't be surprised if the IsValid test only looks until the space and drops the rest of the line item.

As to why that would be the case - I do not know.

Posted: Fri Aug 31, 2012 6:35 am
by arpitchopra
ShaneMuir wrote:Its actually the IsValid test that is not working. If you were to pass your input un trimmed into the IsValid test you would probably find that it returns TRUE.
I don't want to do that as my file may contain those junk spaces.
Consider a scenario where the input is " 12" (or "<space><space><space>12"). I want this to pass the test and be stored as 12.000
Won't IsValid without a Trim fail this case ?
ShaneMuir wrote:I wouldn't be surprised if the IsValid test only looks until the space and drops the rest of the line item.
As to why that would be the case - I do not know.
You are right. I have applied TrimLeadingTrailing in the derivation for a few Varchar columns and it behaves as expected. It doesn't discard everything after the 1st space.




Do you have an alternative to test for this then ?

Posted: Fri Aug 31, 2012 6:35 am
by ShaneMuir
I would suggest using the Num() or Alpha() functions to find those values which contain only numbers or alpha characters.

Posted: Fri Aug 31, 2012 6:44 am
by arpitchopra
ShaneMuir wrote:I would suggest using the Num() or Alpha() functions to find those values which contain only numbers or alpha characters.
But this won't help me check for the precision. I need to discard everything that has precision greater than 3.

I mean we can ideally write a code that will check for the number of characters after "." but then it will be a huge effort to explain the multiple substrings and indices in the formula to someone else. I was hoping for something more subtle and simple. Like the IsValid. If only that would work. :(

Posted: Fri Aug 31, 2012 6:53 am
by ShaneMuir
Well you could try the NUM() function in conjunction with IsValid(). I am pretty sure NUM() will return true for 0123 and 0123.01 or 123.456, but false for 0123 abc.

Also I am pretty sure that the isvalid test won't actually test whether the input string has the relevant precision rather that it can be converted to the relevant precision. So 0123 and 0123.01 and 123.456 would all return true for the IsValid(Decimal) check.

If you want to check whether something has only 3 decimals from a string text I would use something like len(trim(input.string['.',2,1]))=3 or similar.

Posted: Fri Aug 31, 2012 7:01 am
by arpitchopra
ShaneMuir wrote:Also I am pretty sure that the isvalid test won't actually test whether the input string has the relevant precision rather that it can be converted to the relevant precision. So 0123 and 0123.01 and 123.456 would all return true for the IsValid(Decimal) check
You are right. This was a mistake on my part.

However, when I write the following code as a derivation to the same Quantity column, I get 1234.000 as the output.

Code: Select all

StringToDecimal(TrimLeadingTrailing(Lk1.Quantityt),"round_inf")
I don't think it is an issue with IsValid.

Posted: Fri Aug 31, 2012 7:14 am
by ShaneMuir
I suggested that IsValid had an issue, mainly because of your original post which suggested that it was the TrimLeadingTrailing giving the incorrect result, when actually your incorrect result was because the IsValid determined that the string '0123 abc' was a valid decimal and thus created an output you were not expecting.

Also, If you where to pass the string '0123 abc' to a column that is set as decimal with a specific precision, with no functions, conversions etc, i am guessing that it would convert the first part and drop the second part. Ie your output would be eg 123.000. As to my understanding this is because the IsValid test effectively tries to do a StringtoDecimal conversion, if it can, then its valid, if not then invalid. Passing the string to a decimal output would have the same result but you would get the errors written to the log.

Posted: Wed Sep 12, 2012 3:56 am
by arpitchopra
It is actually an issue with 2 things.
1. The first is that StringToDecimal() will automatically convert any string to a decimal (regardless of the format specified) if it CAN be converted. So, "1234 abc" will be converted to 1234
2. The second issue (and a bigger one) is with IsValid(). This function will also work in the same way. It will return true if the string CAN be converted to a decimal. It expects you to handle that conversion in the derivation of the target column.

Please read the following link for further information:
http://www-01.ibm.com/support/docview.w ... wg1JR39651