which one is more costly

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArunaDas_Maharana
Participant
Posts: 42
Joined: Thu Dec 11, 2008 11:07 am

which one is more costly

Post by ArunaDas_Maharana »

hi All,

I am getting xmls from my souce with CRLF characters after each tag, while parsing with xml input it fails.

As a wor around i added Erepacle for search and strip off CRLF in the datastage transformer after my ORAOC19 stage in the server job , the message are stored as clob.

Another way i tried was while extracting data from oracle used replace function something like:
SELECT
Replace(Replace( AGGR_DATA_T.AGGR_MSG_BODY_IMG,chr(13) ,' '), chr(10),' ')

I am expecting 60000 or more records in half an hour time frame(batch will run once in evrery half an hour)

Share your thoughts on this!

Thanks,
Aruna
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

XML files? How are you reading them? If you use a Sequential File stage then the CR/LF pairs could be an issue but shouldn't be if you let the XML Input stage read them directly via the URL/File Path option. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArunaDas_Maharana
Participant
Posts: 42
Joined: Thu Dec 11, 2008 11:07 am

Post by ArunaDas_Maharana »

Yes you are correct! parsing directly while reading from the database won't give this issue

From the database the xmls are landed in the sequential file

I found one more option where if you validate your xml with schema then it will strip off the CRLF from the sequential file.

I can't change change the design at the moment, in your view with the current senario in terms of processing high volume which method you think will be best i am planning for some performance testing on below 2 options:
1. With schema validation
2. With replace function at the oracle stage while extracting the data.

could you please share your view.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Change the design. And I'm not suggestion reading directly from the database, I'm suggesting you let the XML Input stage read them. I'm assuming your job starts with a Sequential File stage that feeds the XML Input. If that's true, then you need to replace the Sequential File stage with a Folder stage with one column in it - the filename. Then set the XML stage to the "URL/Filepath" option and your problem should be solved.

If that's not true, please be more precise about your job design.

For reference, an entry from the excellent blog Ernie Ostic writes:

http://dsrealtime.wordpress.com/2007/12 ... -a-source/
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArunaDas_Maharana
Participant
Posts: 42
Joined: Thu Dec 11, 2008 11:07 am

Post by ArunaDas_Maharana »

Thanks chulett

I will try the two options folder and external stage..
Post Reply