Format XML with TextWrangler

My text editor of choice on the Mac is TextWrangler. It’s lightweight and it has pretty much all you need from a text editor. In particular, I like that I can SFTP into my development server.

One issue that bugged me lately was when I opened an unindented, unformatted XML file. Basically, it looked a mess and there was no way to tidy the file up so that I could read it easily.

However, I found a simple way to do this today… thanks to this and this and this.

Simple guide
We want to add a UNIX script to TextWrangler so it can format an XML file… to do this, do this…

    • Open TextWrangler and open a new text file.
    • Copy and paste the code below into this file.
#!/bin/sh
XMLLINT_INDENT=$'\t' xmllint --format --encode utf-8 -
    • Save the file, something like Tidy XML.sh, in the ~/Library/Application Support/TextWrangler/Text Filters/ folder.
    • Now anytime you want to format an XML file, just go to the Text menu and select the Tidy XML.sh script and BOOM, neat tidy XML.

text-filter

This is an interesting facility to extend an already great text editor, and I will be looking into more cool scripts that can hopefully lessen my daily annoyances.

UPDATED:: Added UTF8 encoding, thanks Rolan.
UPDATED:: Added a post to format PHP code in TextWrangler.
UPDATED:: Updated for TextWrangler version 4.5.8.

83 Kommentare zu „Format XML with TextWrangler

  1. Thanks!
    Small addition that was useful for me:
    If you are working with UTF-8 encoded files – then the following parameter is required:

    --encode UTF-8

  2. great tips.. I have been trying to find how to implement this in TextWrangler from many sites, but yours is the easiest to understand ..

    cheers!

  3. Hi,
    Great script. I have had only one problem. It’s not formating properly empty tags like that:

    So for a file it looks like that:

    6982760

    graphic
    89.000000

    A20110609T092928_M_192_168_112_212_01.eps

    Any chance to fix it?
    Piotr

  4. Great solution!

    However, there’s a big caveat: Tidy also deletes CDATA tags! If you need your XML to maintain the original text as it was, you may run into trouble.

    Tidy does „the right thing“ by replacing sensitive characters with their entities, e.g.:

    <value>[CDATA[Your measurement of n was correct if n<2.5]]></value>

    is transformed to

    <value>Your measurement of n was correct if n&lt;2.5</value>

    This is OK for information in an HTML context, but not if you need the content for other output channels like print!

    1. I removed the –c14n option and it left my CDATA intact.

      xmllint "$*" | XMLLINT_INDENT=$'\t' xmllint --encode UTF-8 --format -
      
  5. Dude, where can I send $1,000,000.00? Seriously, Text Wrangler is my text editor of choice but it’s lack of pretty printing functionality for xml was a pain. I have been looking for a good xml editor to accomplish just what this simple script does. Thanks a bunch!

  6. I am trying to use this, but whenever I do, I get the following error:

    /private/var/folders/-u/-uE-jkIdFUCr9aKmAFj+t++++TI/-Tmp-/Cleanup At Startup/Tidy XML.sh.S:16: parser error : Extra content at the end of the document

    ^
    -:1: parser error : Document is empty
    ^
    -:1: parser error : Start tag expected, ‚<' not found
    ^

    But the document is 1. Not empty and 2. starts with a < character. I have tried using the Zap Gremlins functionality prior to using this script but this did not help. Any suggestions?

    1. ya, it’s broken. here is the fix:

      xmllint „$*“ | XMLLINT_INDENT=$’\t‘ xmllint –encode UTF-8 –format –

      1. I had this problem, solution for me was where you execute the script from, I was picking it from the Script menu, but I guess that’s the wrong broken way to do it. If you go to the TEXT menu, the very first item is APPLY TEXT FILTER, activating it from there worked great…

  7. is there is script that can format xml ignoring if its a valid namespace or the name space prefix is not defined… example

    410709522012-03-26Z
    201203262232561319

    Thank you.

  8. TextWrangler 4 reads from stdin (s. documentation). so the working version for TextWrangler 4 is:

    #!/bin/sh
    xmllint –c14n /dev/stdin | XMLLINT_INDENT=$’\t‘ xmllint –encode UTF-8 –format –

    have fun…

  9. I’ve changed my script to Sascha Appel’s fix for TW 4.0 but I get this error:

    warning: failed to load external entity “–c14n”
    /dev/stdin:1: parser error : Document is empty

    ^
    /dev/stdin:1: parser error : Start tag expected, ‘<' not found

    ^
    warning: failed to load external entity "–encode"
    warning: failed to load external entity "UTF-8"
    warning: failed to load external entity "–format"
    -:1: parser error : Document is empty

    ^
    -:1: parser error : Start tag expected, '<' not found

    ^

    I'm on 10.6.8 is that an issue?

    1. Watch for evil hyphen conversion – copy/pasting the text seems to convert the hyphens into long hyphens

      1. I tried this :

        #!/bin/sh
        xmllint –c14n /dev/stdin | XMLLINT_INDENT=$’\t‘ xmllint –encode UTF-8 –format –

        And it doesn’t work at all (version 4.0 (3142))

      2. I tried this :

        #!/bin/sh
        xmllint –c14n /dev/stdin | XMLLINT_INDENT=$’\t‘ xmllint –encode UTF-8 –format –

        And it doesn’t work at all. (version 4.0 (3142))

        Any idea ?

      3. I found a solution :

        #!/bin/sh
        xmllint –c14n – | XMLLINT_INDENT=$’\t‘ xmllint –encode UTF-8 –format –

      4. This works for 4.0:

        #!/bin/sh
        cat $STDIN | xmllint –c14n – | XMLLINT_INDENT=$’\t‘ xmllint –format –

        Put this your Tidy.sh file in ~/Library/Application Support/TextWrangler/Text Filters

    2. Thanks for the replies 🙂

      I was copying and pasting and so I was falling foul of the automatic formatting issue on this blog’s comments.

      I’ve tried various permutations and I still can’t get it to work 😦

      This time I’m getting this error: (application error code: 32)

      Here are the various permutations I used and here are pastebin.com versions with the exact text formatting http://pastebin.com/DJkc7kPW

      [1]
      #!/bin/sh
      xmllint –c14n — | XMLLINT_INDENT=$’\t’ xmllint –encode UTF-8 –format –

      [2]
      #!/bin/sh
      xmllint –c14n „/dev/stdin“ | XMLLINT_INDENT=$’\t‘ xmllint –encode UTF-8 –format –

      [3]
      #!/bin/sh
      xmllint –c14n – | XMLLINT_INDENT=$’\t’ xmllint –encode UTF-8 –format –

      [4]
      #!/bin/sh
      xmllint –c14n /dev/stdin | XMLLINT_INDENT=/dev/stdin’\t’ xmllint –encode UTF-8 –format –

      [5]
      #!/bin/sh
      xmllint –c14n – | XMLLINT_INDENT=$’\t’ xmllint –encode UTF-8 –format –

      1. Just to clarify, I also re-did Sascha Appel’s fix for TW4.0 with the correct [I think!] double dash formatting but I get the error 32 for that too:

        
[6]
        #!/bin/sh
        xmllint --c14n /dev/stdin | XMLLINT_INDENT=$’\t’ xmllint --encode UTF-8 --format -


        I’m trying out Eoin’s source code short cut for posting code on this blog to fix the formatting issue. If the code above is a mess or missing its also on pastebin http://pastebin.com/Yj1zE7Lq

  10. The source code short cut worked!!
    For any one else who’s interested I wrapped the code in the following -except use square brackets instead of

    

    <sourcecode language="text">
    code goes here
    
</sourcecode>
    

    1. Man, the automatic formatting really messes things up! That post was supposed to read „except use square brackets instead of greater than or less than signs“

  11. Both epharion and Mitch’s commands will work for TextWrangler verson 4, however they are mangled by the automatic formatting. It took me a while to figure out what was happening, so I’ve reposted their commands below. (If you’re curious, the difference is that the long hypen before the 
options should be a double dash, the long hyphen after „c14n“ should be single dash, and the single quotes need to be changed to simple straight quotes instead of curly quotes.)

    #!/bin/sh
    xmllint –c14n – | XMLLINT_INDENT=$’\t‘ xmllint –encode UTF-8 –format –

    or

    #!/bin/sh
    cat $STDIN | xmllint –c14n – | XMLLINT_INDENT=$’\t‘ xmllint –encode UTF-8 –format –

    As Mitch said:
    Put this in your Tidy.sh file in ~/Library/Application Support/TextWrangler/Text Filters

    Also:
    -If you’re not using UTF-8 encoding, remove „–encode UTF-8“.
    -If you prefer to indent with spaces instead of tabs, replace XMLLINT_INDENT=$’\t‘ with XMLLINT_INDENT=‘ ‚, and place the number of spaces that you want for each indentation between the single quotes.

  12. The following thing worked for me for TextWrangler 4.0.1:

    #!/bin/sh
    cat $STDIN | xmllint – | XMLLINT_INDENT=‘ ‚ xmllint -encode UTF-8 -format –

    I’ve placed the above text in Script.sh and put that in ~/Library/Application Support/TextWrangler/Text Filters
    And then restarted the TextWrangler, found „script.sh“ under Text>Apply Text Filter, clicking the menu item on an unformatted xml file, formatted it instantly.

    While copying the script, I dont know why the hyphen is getting replaced with long hyphen, so I had to manually edit long hyphen with normal hyphen, which evaded me of bunch of parsing errors… [errors like failed to load –format..etc.]

  13. Great write-up! I noticed that the latest version failed to create the text filter path in the article. I had to manually create it, then put my .sh file, and restart TextWrangler. Once i did that, the filter became available.

    Thank you!

  14. Thanks for the script! The instructions should be modified though to place it into your home directory, not HD.. I spent a while googling why I couldn’t find my Text Wrangler folder in /Library/Application Support/TextWrangler/Text Filters/. Eventually I thought that maybe you meant ~/Library/Application Support/TextWrangler/Text Filters/ and looks like you did. Now it works great!

  15. You are awesome. Very simple and straight forward instructions and it worked like a charm. Such an important functionality in the text editor these days. Thanks.

Hinterlasse eine Antwort zu hajneosource Antwort abbrechen