ZestyParser
Parsing with a touch of Python.
Download
The latest version of ZestyParser is 0.7.0.
Install it by running the command easy_install -U ZestyParser (requires setuptools).
Or download a .egg for your Python version from the Python Cheese Shop, and rename the file to ZestyParser.pyc;
you can then import it directly, courtesy of Squisher.
The source distribution is also available there.
Here are the release notes.
ZestyParser is Free Software released under the terms of the GNU General Public License.
Warning: The current version's documentation is terrible. Please stand by for 0.8.0 or possibly something entirely different.
Executive summary
- ZestyParser is a Python package for writing data parsers. Great for "little languages"... and big languages.
- When you're writing a parser, ZestyParser lets you write a lot less code.
- With ZestyParser, the code you do have to write can be a lot more readable.
Background and philosophy
Parsing has long been a programming problem that has fascinated me. There are so many packages, so many theoretical approaches, even an abundance of Python implementations, but somehow I haven't been able to find one with a fully Pythonic feel and a complete range of functionality. That's the problem I'm trying to solve with ZestyParser. Here are some of the ways in which it differs from the norms of the Python parsing world.
- ZestyParser doesn't make you define your grammar in a weird, non-Pythonic syntax. Everything is written how you normally write Python. It doesn't reinvent the wheel where Python already provides a familiar approach. You already know how to use Python's built-in
re module, so, despite its limitations (which is why you're looking for a parsing package!), you can use it to construct ZestyParser tokens when appropriate; no need to reimplement everything in an overly verbose (and probably slower) manner, as do some otherwise-decent systems which shall remain nameless. Such as pyparsing.
- ZestyParser doesn't even limit you to context-free grammars. It can scale up and it can scale down. You don't need to use it lex/yacc-style, where lexical tokens and grammar are defined separately; a token can technically be any callable Python object with certain semantics, and, once invoked, it can do any analysis it needs to with the parser cursor and the data stream. It provides built-in token classes to handle most parsing tasks, including for combining other tokens in various ways, but when you need something more complex, you can always write your own which still integrates perfectly with your existing tokens. (However, this is rarely necessary because of the built-in token classes' support for callbacks.) If you wish, you can write your parsers with a lex/yacc-like design pattern, but ZestyParser is usually powerful enough to make such grammars more concise and Pythonic.
- ZestyParser doesn't make assumptions about the kind of syntax you're parsing. It doesn't default to, say, skipping whitespace, but when you do need to, telling it to do so is trivial. And even then, it doesn't have to skip whitespace everywhere; it gives you as much control as you need.
- ZestyParser is pure Python. No compilation is necessary. This is somewhat of a performance disadvantage in comparison to compiled modules like DParser (which, of course, also limits you to context-free grammars defined in BNF-like docstrings), but ZestyParser's goal is not raw speed, but rather to be as dynamic as Python is. (Actually, in the future I may release an (optional and seamlessly compatible) accelerator module written as a C extension.)
- ZestyParser does not require code generation. You don't need to write your parsing code in a separate file and then have it "compiled" into a Python module; ZestyParser tokens and parsers are first-class Python citizens.
Okay, enough about what ZestyParser isn't. What, then, is it? Well, you know it's a parser kit; it doesn't use a formal parsing algorithm like LL or LR, and as I said, it isn't even restricted to context-free grammars. If anything, it is a tool for concisely and Pythonically constructing recursive-descent-style parsers. I realize that some formal parsing algorithms are better suited to some grammars for their speed benefit, but ZestyParser is optimized for humans first, computers second. Try it out and see if it meets your parsing needs. If it doesn't, email me and let me know why; if it does, and you're using it in a released program, I'd be happy if you emailed me and let me know, so I can mention it on this page!
Resources
Examples
The source package includes several examples to show you what ZestyParser can do (though only the tip of the iceberg). I'm always adding more, because I love the warm fuzzy feeling you get from writing a concise and effective parser. (Um, everyone gets that feeling, right?)
- bdecode: Parses BitTorrent's bencode format.
- calcy: The token (pun intended) 4-function calculator example.
- elements: Parses chemical formulae.
- n3: A more complex and practical example. Parses the RDF Notation3 format. This is the shortest relatively complete Python-based N3 parser that I know of.
- n3.py: Parses the format and returns an abstract parse tree.
- n3rdflib.py: Takes the structure returned by n3.py and converts it into an RDFLib graph object.
- phpserialize: Parses PHP's serialize format (aside from objects, of course).
- plist: Parses Apple's oldstyle ASCII plist format. (Why did they ever replace it with that crappy XML format, instead of just doing a find-and-replace from "ASCII" to "UTF-8" in the specification?)
- sexp: Parses (most of) Rivest's version of S-expressions.
- unittests: Tests for all the included token types.
Who's Using It?
- UliPad, a flexible text editor, uses it for syntax highlighting
- A silent majority of polar lions use it to process radio transmissions they receive in their fillings
Email me if you have a program you'd like added to this list! It would be nice to have more than one real entry on it.