Page 1 of 1

PX XML Input stage dropping records. No warnings or errors.

Posted: Sun Nov 23, 2014 1:53 pm
by vlis
We recently completed a DataStage upgrade from 7.5.2 to 9.1.2.

DS 7.5.2
Server: None
Parallel: None


DS 9.1.2 NLS:
Server: UTF8 (Project Default)
Parallel: ASCL_ISO8859-1 (Project Default)

Parallel Job Description:
External source stage provides filename to XML input stage
XML input stage uses filename to parse XML file and writes to data files.

Observations (Same data file)
DataStage 7.5.2, XML stage outputs 10088 records
DataStage 9.1.3, XML stage outputs 4,798 records. No warnings or errors.

I found that data file contained the following html entity codes: 

When decoded, this is a newline.

When I removed html entities from the file, all 10088 records are processed.

DataStage job contains the following: External source (sends file name) connected to XML Input connected to transform connected to sequential file.

Questions:

Why does DataStage not write errors or warnings to the log?

Is there a way to tell DataStage 9.1.2 to ignore html entities and treat them as text?

Is this an NLS issue? NLS = None is not an option?

Any suggestions on how to resolve this?

Posted: Sun Nov 23, 2014 5:48 pm
by eostic
...what does "outputs" mean?

...outputs from the stage (formal Link Counts on that link)...or what you see in the final sequential file?

The answer to that question could be very telling in this example.

Let us know EXACTLY the different row counts for the output link coming from the Stage (without regard to the eventual target).

Ernie

Updated original post

Posted: Sun Nov 23, 2014 8:26 pm
by vlis
Clarifications:
output means data written from XML stage and sent to transformer (and eventually a sequential file stage.

DataStage 7.5.2 and 9.1.2 jobs are identical. Job was exported from 7.5.2 and imported into 9.1.2.

The 4,798 records eventually written by the 9.1.2 job matches the data from the 7.5.2.

When I remove the 
 from the source file and process the file in 7.5.2 and 9.1.2 environments, the final files both contain 10,088 records and the data is identical.

Posted: Mon Nov 24, 2014 10:48 pm
by asorrell
Hex 0A is the UNIX newline character - I would suspect that something is interpreting it as an EOL.

Posted: Wed Nov 26, 2014 10:59 am
by eph
Hi,

I know it won't help that much, but I faced the same problem on version 8.1 two years ago (only that my job failed instead of processing partial data).
I didn't found any info on this, should have raised a PMR on this but didn't had time for it.

Here is my old topic: http://dsxchange.com/viewtopic.php?t=145488.

Don't know why it wasn't solved, since those characters are in xml norms.

Edit: found this technote on another post of mine :)

Eric