Page 1 of 1

Modify Stage decimal_from_string

Posted: Tue Jul 08, 2014 7:59 am
by ymadden@bmi.com
We have a DataStage job that reads multiple delimited-file formats using a sequential file stage and various schema files. We then use a modify stage to rename columns so that they line up with an interface table in our database.

We are having problems reading certain numeric fields from the delimited files when the numbers have been formatted with a comma as the thousands separator. We can read the numeric values as a string, but the result of the decimal_from_string conversion function comes out NULL if the value contains a comma. Is there any way to strip the commas in the modify stage? Or are there any options available for the decimal_from_string function?

Posted: Tue Jul 08, 2014 8:53 am
by asorrell
I don't have access to DataStage at the moment, but don't remember an option in the Modify stage to resolve this.

As a workaround, you could use a transformer Trim to strip the commas:

StringToDecimal(Trim(mylink.mycol,",","A"))

Re: Modify Stage decimal_from_string

Posted: Tue Jul 08, 2014 8:55 am
by nirdesh2
You can use convert function in transformer to convert the values like 12,000 to 12000 and then use StringToDecimal to convert into decimal.

Posted: Tue Jul 08, 2014 1:56 pm
by ymadden@bmi.com
I was hoping to avoid using a transformer. I was also hoping that since they allow the use of date format masks for date conversions in the modify spec, that they would also allow the use of number format masks.

The files that we are loading do not all contain the same columns. By using the schema files and by passing modify stage specifications as job parameters we are able to use RCP and have one single job to process various file formats. In order to operate on a column in a transformer, that column must be declared, making it required. This would reduce the reusability of this job. If there is no other option though, then we will just have to bite the bullet.

Is it possible to specify a number format on a schema file? I believe it is possible to specify a date format. I have not been able to find thorough documentation on schema file formats.

Posted: Tue Jul 08, 2014 4:11 pm
by ssnegi
you could strip the delimited file in the before-job routine by invoking shell script.
sed "s/,//g" file.txt

Posted: Wed Jul 16, 2014 7:42 am
by ymadden@bmi.com
This is essentially what we ended up doing. The jobs were already set up to allow sed commands to be passed in for pre-processing. We used a more complicated regular expression though, to avoid destroying the comma delimiters in the csv files.

We also tried specifying a c_format string, taking from this post, but couldn't get it to work.

Posted: Wed Jul 16, 2014 2:36 pm
by ray.wurlod
You could use the string_trim() function in the Modify stage to remove the comma.

Why do you wish to avoid using Transformer stage?

Posted: Mon Jul 21, 2014 1:16 pm
by ymadden@bmi.com
Hi Ray,
It seems the string_trim function only removes leading or trailing characters. It has no effect on the characters in the middle of the string.

As for avoiding a Transformer - The files that we are loading do not all contain the same columns. By using schema files and by passing modify stage specifications as job parameters we are able to use RCP and have one single job to process various file formats. In order to operate on a column in a transformer, that column must be declared, making it required. This would reduce the reusability of this job.