My text editor of choice on the Mac is TextWrangler. It’s lightweight and it has pretty much all you need from a text editor. In particular, I like that I can SFTP into my development server.
One issue that bugged me lately was when I opened an unindented, unformatted XML file. Basically, it looked a mess and there was no way to tidy the file up so that I could read it easily.
However, I found a simple way to do this today… thanks to this and this.
Simple guide
We want to add a UNIX script to TextWrangler so it can format an XML file… to do this, do this…
- Open TextWrangler and open a new text file.
- Copy and paste the code below into this file.
#!/bin/sh xmllint --c14n "$*" | XMLLINT_INDENT=$'\t' xmllint --encode UTF-8 --format -
- Now anytime you want to format an XML file, just go to this menu and select the Tidy XML.sh script and BOOM, neat tidy XML.

This is an interesting facility to extend an already great text editor, and I will be looking into more cool scripts that can hopefully lessen my daily annoyances.
UPDATED:: Added UTF8 encoding, thanks Rolan.
UPDATED:: Added a post to format PHP code in TextWrangler.
Thank you, thank you, thank you! You’ve saved my date
And now s/date/day
Pingback: Format XML with TextWrangler – Cristian’s blog
Pingback: Textwrangler Tips – Compare files, keyboard shortcuts « Magp.ie
Awesome! Thanks for putting this together!
OMG you just saved me so much time, you have no idea. Thank you!!
Brilliant! Not sure why this isn’t standard functionality in text editors these days but many thanks for showing us how easy it is to add!
Thanks!
Small addition that was useful for me:
If you are working with UTF-8 encoded files – then the following parameter is required:
--encode UTF-8
Pingback: Tidy and format your PHP and meet WordPress standards on Coda and TextWrangler « Magp.ie
Thanks a lot! Works beautifully.
Thanks this was really useful!
great tips.. I have been trying to find how to implement this in TextWrangler from many sites, but yours is the easiest to understand ..
cheers!
Muchos gracias!
Very nice. Simple and efficient. Thanks to publishing it.
Awesome time saver!
Hi,
Great script. I have had only one problem. It’s not formating properly empty tags like that:
So for a file it looks like that:
6982760
graphic
89.000000
A20110609T092928_M_192_168_112_212_01.eps
Any chance to fix it?
Piotr
Hmm.. all xml code was just exchanged for something else. Any way to post xml inside comment?
Piotr
Hi Piotr, you could try posting the xml in the comment within the sourcecode shortcode
Thank You
Perfect, thank you.
Thanks, cool tip
Is there a way to have the script not truncate empty tags? I need the XML output to format empty tags with both the open and closing tags like so:
instead of
Please advise.
Looks like it didn’t post my XML examples. Here they are:
Desired:
“”
Currently, the script truncates to this:
“”
Hi Michael, you could try posting the xml in the comment within the sourcecode shortcode
Thank you Thank you Thank you.
Great solution!
However, there’s a big caveat: Tidy also deletes CDATA tags! If you need your XML to maintain the original text as it was, you may run into trouble.
Tidy does “the right thing” by replacing sensitive characters with their entities, e.g.:
is transformed to
This is OK for information in an HTML context, but not if you need the content for other output channels like print!
I removed the –c14n option and it left my CDATA intact.
Thanks!!!
Dude, where can I send $1,000,000.00? Seriously, Text Wrangler is my text editor of choice but it’s lack of pretty printing functionality for xml was a pain. I have been looking for a good xml editor to accomplish just what this simple script does. Thanks a bunch!
Very helpful. Thank you very much for taking the time to write this up.
Thanks a lot, really useful!
I am trying to use this, but whenever I do, I get the following error:
/private/var/folders/-u/-uE-jkIdFUCr9aKmAFj+t++++TI/-Tmp-/Cleanup At Startup/Tidy XML.sh.S:16: parser error : Extra content at the end of the document
^
-:1: parser error : Document is empty
^
-:1: parser error : Start tag expected, ‘<' not found
^
But the document is 1. Not empty and 2. starts with a < character. I have tried using the Zap Gremlins functionality prior to using this script but this did not help. Any suggestions?
ya, it’s broken. here is the fix:
xmllint “$*” | XMLLINT_INDENT=$’\t’ xmllint –encode UTF-8 –format -
I am still getting the Document is empty error.
is there is script that can format xml ignoring if its a valid namespace or the name space prefix is not defined… example
410709522012-03-26Z
201203262232561319
Thank you.
Thankyou for your share
Trying to figure out how to do this with Text Wrangler 4.0′s new Text Filters but not having much luck. Anyone else?
Anybody got this working with TextWrangler 4? It doesn’t seem to work anymore…
TextWrangler 4 reads from stdin (s. documentation). so the working version for TextWrangler 4 is:
#!/bin/sh
xmllint –c14n /dev/stdin | XMLLINT_INDENT=$’\t’ xmllint –encode UTF-8 –format -
have fun…
Perfect!
Thanks
I’ve changed my script to Sascha Appel’s fix for TW 4.0 but I get this error:
warning: failed to load external entity “–c14n”
/dev/stdin:1: parser error : Document is empty
^
/dev/stdin:1: parser error : Start tag expected, ‘<' not found
^
warning: failed to load external entity "–encode"
warning: failed to load external entity "UTF-8"
warning: failed to load external entity "–format"
-:1: parser error : Document is empty
^
-:1: parser error : Start tag expected, '<' not found
^
I'm on 10.6.8 is that an issue?
Watch for evil hyphen conversion – copy/pasting the text seems to convert the hyphens into long hyphens
it’s still double dashes for the options, just replace the “$*” with /dev/stdin
I tried this :
#!/bin/sh
xmllint –c14n /dev/stdin | XMLLINT_INDENT=$’\t’ xmllint –encode UTF-8 –format -
And it doesn’t work at all (version 4.0 (3142))
I tried this :
#!/bin/sh
xmllint –c14n /dev/stdin | XMLLINT_INDENT=$’\t’ xmllint –encode UTF-8 –format -
And it doesn’t work at all. (version 4.0 (3142))
Any idea ?
I found a solution :
#!/bin/sh
xmllint –c14n – | XMLLINT_INDENT=$’\t’ xmllint –encode UTF-8 –format -
Thanks for the replies
I was copying and pasting and so I was falling foul of the automatic formatting issue on this blog’s comments.
I’ve tried various permutations and I still can’t get it to work
This time I’m getting this error: (application error code: 32)
Here are the various permutations I used and here are pastebin.com versions with the exact text formatting http://pastebin.com/DJkc7kPW
[1]
#!/bin/sh
xmllint –c14n — | XMLLINT_INDENT=$’\t’ xmllint –encode UTF-8 –format -
[2]
#!/bin/sh
xmllint –c14n “/dev/stdin” | XMLLINT_INDENT=$’\t’ xmllint –encode UTF-8 –format -
[3]
#!/bin/sh
xmllint –c14n – | XMLLINT_INDENT=$’\t’ xmllint –encode UTF-8 –format -
[4]
#!/bin/sh
xmllint –c14n /dev/stdin | XMLLINT_INDENT=/dev/stdin’\t’ xmllint –encode UTF-8 –format -
[5]
#!/bin/sh
xmllint –c14n – | XMLLINT_INDENT=$’\t’ xmllint –encode UTF-8 –format -
Just to clarify, I also re-did Sascha Appel’s fix for TW4.0 with the correct [I think!] double dash formatting but I get the error 32 for that too:
I’m trying out Eoin’s source code short cut for posting code on this blog to fix the formatting issue. If the code above is a mess or missing its also on pastebin http://pastebin.com/Yj1zE7Lq
The source code short cut worked!!
For any one else who’s interested I wrapped the code in the following -except use square brackets instead of
Man, the automatic formatting really messes things up! That post was supposed to read “except use square brackets instead of greater than or less than signs”