Column Import stage - Implicit conversion

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Amedhyaz
Participant
Posts: 6
Joined: Fri Mar 30, 2012 10:52 am

Column Import stage - Implicit conversion

Post by Amedhyaz »

Parallel job parses a source file, using a schema and a Column Import stage. Delivers to a dataset.

62815 records involved. 4 timestamp columns to be delivered.

Job warns on a Broken pipe and ends up aborting on a SIGSEGV.

May the 62815 * 4 implicit conversions varchar to timestamp be an issue?
Amedhyaz
DataStage & Metadata Workbench Developer
Information Sever Administrator
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Yes - try using StringtoTimestamp function with appropriate mask.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Amedhyaz
Participant
Posts: 6
Joined: Fri Mar 30, 2012 10:52 am

Re: Column Import stage - Implicit conversion

Post by Amedhyaz »

Unfortunately, an explicit conversion won't help, as far as my understanding goes.

The whole purpose of using a "Column Import" stage is to parse the source text file, read as a LongVarChar; break it down into a collection of records, by means of the Orchestrate schema provided; and deliver the the records to a dataset, after successfully completing all needed conversions, again by referring to the hints from the Orchestrate schema file. That is to say that the record structure is implicit at design time and only known at run time.

My question is, therefore, as follows: From your experience, do you think "Column Import" stage may have a hard time handling about 250,000 complex implicit conversions to timestamps?

A subsidiary question would be: Is it possible to give a mask to a timestamp column in an Orchestrate schema file?
Amedhyaz
DataStage & Metadata Workbench Developer
Information Sever Administrator
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Sorry, missed that distinction. I don't think it would have a problem with that many records.

How certain are you that all the data matches the schema? I've had SIGSEGV's when an invalid timestamp (all zero, or all blank) was in the data.

Can you check the relevant phantom file in the &PH& directory of the project for your last run? Sometimes that has more detail about what caused the SIGSEGV.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Sorry, missed that distinction. I don't think it would have a problem with that many records. However, you can confirm whether its a size problem by breaking the source file into pieces and then process them individually. If they all process as smaller chunks, then it is some sort of buffering / size issue. If not, then it is a data problem.

How certain are you that all the data matches the schema? I've had SIGSEGV's when an invalid timestamp (all zero, or all blank) was in the data.

Can you check the relevant DSD.RUN file in the &PH& sub-directory of the project for your last run? Sometimes that has more detail about what caused the SIGSEGV.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply