Format XML Output

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
just4geeks
Premium Member
Premium Member
Posts: 644
Joined: Sat Aug 26, 2006 3:59 pm
Location: Mclean, VA

Format XML Output

Post by just4geeks »

I know that this question has been thrown around here a lot.
Current Output:

Code: Select all

    <Code1>
902
    </Code1>
    <Code2>
19
    </Code2>
    <Code3>
9005
    </Code3>

Expected Output:

Code: Select all

    <Code1>902</Code1>
    <Code2>19</Code2>
    <Code3>9005</Code3>
I understand the current output is good enough for any application that processes XML files. Be that as it may, I am still curious to find out if anyone has had any luck removing the new line in DataStage. I know that we can write sed commands to format. But I haven't found one that is generic, i.e., independent of tag names.

Any help will be really appreciated.
Attitude is everything....
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

No offense meant by this, but it is simply not an answer, technique, or subject for discussion. Valid xml, by the "xml standards" (see w3c.org), considers CRLFs and extra blanks as "noise". About 10 to 12 years ago, being able to do pretty formatting was useful, because there weren't so many tools that understood it. But now, it should not be an issue. Read-ability (the only reason to consider different appearance), is better managed in nearly every editor and browser available on the market --- including color coding.

DataStage is about moving and transforming data....and any blanks just make that more difficult if the data volumes are large. You should not select the formatting option.

If you are being asked to do this, tell the one who is asking "no". ...and that they need to get themselves an alternate tool for reading the xml.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

First question - why format it at all? It's not anything needed and if you want to see it formatted open it in something like IE. To pursue this, in your shoes I'd look into something like a 'pretty printing' utility you could call from the command line as a post-process that would output a formatted file that is more to your liking.

D'oh... too slow. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
MrBlack
Participant
Posts: 125
Joined: Wed Aug 08, 2012 8:57 am

Re: Format XML Output

Post by MrBlack »

This issue has also bugged me to. And I understand all the arguments that pretty XML isn't a requirement but unfortunately I have to work with submitting XML to other systems that suck and require pretty XML.

So just wanted to post my work around to getting pretty XML, unfortunately it involves human intervention, maybe one day I'll figure out a way to completely automate it.

Prerequisites:
Notepad++ with the XML Tools plugin installed

1. DataStage XML output without formatting all on a single line
2. Open file in Notepad++
3. Plugins > XML Tools > Pretty Print with line breaks
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Cool! Nice solution for those times when it might be needed......and then get it on record that the "other" tooling needs to upgrade it's parser. ;)
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Trying to remember what we ended up using after job - XML Beans, perhaps? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
just4geeks
Premium Member
Premium Member
Posts: 644
Joined: Sat Aug 26, 2006 3:59 pm
Location: Mclean, VA

Re: Format XML Output

Post by just4geeks »

One of the developers here was able to develop a Perl script that removes the new lines around elements. And it works fine.

Here is the code, for anyone who is interested. Just ensure to pass the file name as an argument.

Code: Select all

sub trim {
    (my $s = $_[0]) =~ s/^\s+|\s+$//g;
    return $s;
}

sub get_file_contents {
        local $/ = undef;
        open FILE, $_[0] or die "Couldn't open file: $!";
        binmode FILE;
        $file_contents = <FILE>;
        close FILE;
        return $file_contents;
}

sub write_back_to_file {
        open (FILE, ">".$_[0]);
        print FILE $_[1];
        close (FILE);
}

$file_name = $ARGV[0];
$file_contents = get_file_contents($file_name);
$file_contents  =~ s/(<.+?>)(\s+?[0-9a-zA-Z-=]+?\s+?)(<\/.+?>)/$1.trim($2).$3/ge;

write_back_to_file($file_name, $file_contents);
Attitude is everything....
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

You can get entire content in a single row/line by unchecking the formatted output option in xml output stage
Thanks,
Prasanna
just4geeks
Premium Member
Premium Member
Posts: 644
Joined: Sat Aug 26, 2006 3:59 pm
Location: Mclean, VA

Post by just4geeks »

prasannakumarkk wrote:You can get entire content in a single row/line by unchecking the formatted output option in xml output stage
But the following format is what I expect in the xml file, instead of a line row/line.

Code: Select all

    <Code1>902</Code1> 
    <Code2>19</Code2> 
    <Code3>9005</Code3>
Attitude is everything....
Post Reply