Page 1 of 1

Column Import stage - Implicit conversion

Posted: Wed Feb 19, 2014 12:05 pm
by Amedhyaz
Parallel job parses a source file, using a schema and a Column Import stage. Delivers to a dataset.

62815 records involved. 4 timestamp columns to be delivered.

Job warns on a Broken pipe and ends up aborting on a SIGSEGV.

May the 62815 * 4 implicit conversions varchar to timestamp be an issue?

Posted: Wed Feb 19, 2014 12:32 pm
by asorrell
Yes - try using StringtoTimestamp function with appropriate mask.

Re: Column Import stage - Implicit conversion

Posted: Wed Feb 19, 2014 1:32 pm
by Amedhyaz
Unfortunately, an explicit conversion won't help, as far as my understanding goes.

The whole purpose of using a "Column Import" stage is to parse the source text file, read as a LongVarChar; break it down into a collection of records, by means of the Orchestrate schema provided; and deliver the the records to a dataset, after successfully completing all needed conversions, again by referring to the hints from the Orchestrate schema file. That is to say that the record structure is implicit at design time and only known at run time.

My question is, therefore, as follows: From your experience, do you think "Column Import" stage may have a hard time handling about 250,000 complex implicit conversions to timestamps?

A subsidiary question would be: Is it possible to give a mask to a timestamp column in an Orchestrate schema file?

Posted: Wed Feb 19, 2014 2:29 pm
by asorrell
Sorry, missed that distinction. I don't think it would have a problem with that many records.

How certain are you that all the data matches the schema? I've had SIGSEGV's when an invalid timestamp (all zero, or all blank) was in the data.

Can you check the relevant phantom file in the &PH& directory of the project for your last run? Sometimes that has more detail about what caused the SIGSEGV.

Posted: Wed Feb 19, 2014 2:30 pm
by asorrell
Sorry, missed that distinction. I don't think it would have a problem with that many records. However, you can confirm whether its a size problem by breaking the source file into pieces and then process them individually. If they all process as smaller chunks, then it is some sort of buffering / size issue. If not, then it is a data problem.

How certain are you that all the data matches the schema? I've had SIGSEGV's when an invalid timestamp (all zero, or all blank) was in the data.

Can you check the relevant DSD.RUN file in the &PH& sub-directory of the project for your last run? Sometimes that has more detail about what caused the SIGSEGV.