From: Richard Smith on
I would be interested to know the opinion of this newsgroup on whether
it is sensible using Boost.Spirit in production code. In a project
I'm working on, I'm likely to need to produce three non-trivial C++
parsers: one for a network protocol quite similar in structure to
HTTP, one for a expression parser (not unlike the language used by the
Unix tool, bc, but for domain-specific data-types), and one for a
particular SGML language (considerably simpler than XML, assuming
there are no unanticipated complications).

All of these components are things for which I would be comfortable
implementing hand-crafted parsers, but equally if there are better
ways of generating moderately efficient and, critically, easily
maintainable parsers, I would be keen to use them. Boost.Spirit seems
to be one of the more obvious possibilities.

However, having experimented with Boost.Spirit a bit, I have a number
of concerns about its appropriateness for use in production code and I
would be interested in others' opinions.

* Documentation. Given the size of the library, its documentation is
really fairly lightweight and I've invariably found myself reading the
code to find out how things work. Just to take two examples, where is
it documented what characters alpha_p matches, and where is it
documented which headers I should include to use it?

* Compile times. Perhaps there are implementation techniques that I'm
missing, but most of the non-trivial examples I've experimented with
take serious long times to compile. In one case, over an hour for a
single translation unit. I prefer to work with a rapid modify-
recompile-test development cycle, but I don't see that being feasible
if I use significant Boost.Spirit components.

* Error messages. Introduce an error into the code and, frankly, the
resulting verbiage emitted by the compiler is utterly impenetrable.
This is, of course, true of many complex template libraries in C++,
and maybe when C++ (eventually) gains concepts, it will improve. But
it doesn't help with today's language.

* Poor IOStream interoperability. There are two aspects here. First,
it would be nice if, when I produced an LALR(1) parser, it would work
with InputIterators without my needing to adapt them with multi_pass.
(Admittedly, I'm not sure exactly how that could work as I cannot see
how the compiler can work out at compile time whether the grammar is
LALR(1).) Careless buffering by multi_pass could easily kill one of
the applications I have in mind. Secondly, it would be nice if there
were some easy way to keep input and output in sync, if not by having
a single function that does both (in simple cases, I've seen the %
operator overloaded to reasonable effect to implement both << and >>),
then by having similar-looking input and output functions that leave
it easy to verify by eye their compatibility. Maybe that's something
I can still build on top of Boost.Spirit, but that sounds a daunting
dask.

However, in other ways, I like the look of Spirit. The BNF-form of
the code is much closer to the specification I'm working to -- this
sounds like a good way of making sure the two stay in sync as the
underlying specifications evolve (which I expect them to do). To my
pleasant surprise, the object code produced by Spirit is concise and
efficient. And I'm sure that as I get more familiar with it, I'll get
better at writing correct code faster. Looking around, I also see
many quite positive comments about it.

So, what is the opinion here? Is it worth pursuing Boost.Spirit?

--
Richard Smith

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Chris Morley on
> So, what is the opinion here? Is it worth pursuing Boost.Spirit?

I would definitely recommend you use machine written parsers and not hand
written ones, even if the language is pretty trivial. The grammar files are
much easier to review at a later date than c++ when you look at the system
6,12,24 months time. You are quite right that the grammar is essentially
documentation.

I've not used Boost.Spirit, in the intro they say it is simpler than BISON
or ANTLR. I wonder what functions you might need down the line which the
likes of Bison/Antlr have... I use Bison and it is actually very easy to use
with either a machine written scanner (e.g. Flex) or hand written. Bison
doesn't do C++ very well (its roots are C) but the skeleton is ok and easy
enough to handle. Building a parse tree or running semantic actions directly
in the grammar is equally simple. Now I know it, I don't think Bison is
overkill for even the most trivial parsers - in fact I wish I'd used Flex
too for some trivial scanners despite fact it doesn't play brilliantly with
C++.

To answer you concerns about Spirit with my Bison experience... (i'm sure
most would apply to antlr too)
> * Documentation
There is stacks on the internet & paper books. Anything related to YACC is
applicable too. comp.compilers users are very helpful if you truly get stuck
with your grammar. In fact you can often find antlr/yacc/bison grammars on
the internet for common languages/formats.

>* Compile times
Fast. Set up the dependency in the project, bison turns the .y/.yy grammar
into C/C++ quickly & compile/link as normal. Matter of seconds for Bison to
parse a C complexity grammar on a modern machine.

>* Error messages
Basic errors are easy enough but bugs in your grammar can take a bit of
learning. The debug output is there though to find out what you did!
Shift/reduce & reduce/reduce conflicts etc. not complicated to learn about
but can be confusing if you're new to e.g. LALR(1) parsing. (sounds like you
aren't though) As with most things there are newbie pitfalls.

> * Poor IOStream interoperability
Roll your own scanner & interface how you like. Bison eats tokens & will
reduce what is can based on the input upto that point - so it is in sync.
Even use different scanners at runtime (wide char support? no problem!).
Parsing human input (e.g. calculator) line by line, no problem.

So my advise boils down to: Yes definitely use a machine parser. Don't worry
if it seems overkill as if your "specifications evolve" as you suggest they
might you may save yourself a lot of effort later on. Bison/Antlr are well
established and well used, fast and reliable. I'd suggest you use one of
those two (or similar) - sounds like you already have too many question
marks about Spirit for your application.

Chris



--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Joe on
On Nov 10, 3:26 pm, Richard Smith <rich...(a)ex-parrot.com> wrote:
> I would be interested to know the opinion of this newsgroup on whether
> it is sensible using Boost.Spirit in production code.

I have had the same questions, but unlike you have not tested the library
yet. I am very interest in the results of the thread. BTW, was your review
of Spirit based upon the Version 2.x of the library that i think is about to
be release in the next version of boost. It is suppose to be much better.

Also have you look at the Boost library, Xpressive.

Joe


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: SeanW on
On Nov 10, 4:26 pm, Richard Smith <rich...(a)ex-parrot.com> wrote:
> So, what is the opinion here? Is it worth pursuing Boost.Spirit?

I think Spirit is very slick, but decided against
using it when I considered what would happen if
I got into trouble. It's one thing to see a 100KB
error message and try to sort it into one of a few
categories as Jeff Flinn says above, but what if
you've got to actually stick your hand in that toilet
with a debugger when you have some problem in the
field? I couldn't bear the thought, so went with
one of the old-school parser generators.

Sean


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: CornedBee on
On Nov 10, 10:26 pm, Richard Smith <rich...(a)ex-parrot.com> wrote:
>
> However, having experimented with Boost.Spirit a bit, I have a number
> of concerns about its appropriateness for use in production code and I
> would be interested in others' opinions.
>
> * Documentation. Given the size of the library, its documentation is
> really fairly lightweight and I've invariably found myself reading the
> code to find out how things work. Just to take two examples, where is
> it documented what characters alpha_p matches, and where is it
> documented which headers I should include to use it?

I agree about the headers, but other than that I found the docs to be
quite good.

>
> * Compile times. Perhaps there are implementation techniques that I'm
> missing, but most of the non-trivial examples I've experimented with
> take serious long times to compile. In one case, over an hour for a
> single translation unit. I prefer to work with a rapid modify-
> recompile-test development cycle, but I don't see that being feasible
> if I use significant Boost.Spirit components.

If you're using GCC, upgrade to 4.4. It should have greatly increased
the speed here. But yes, Spirit is a very metaprogramming-heavy
library and takes a long time to compile. You should make sure that
you separate spirit parsers into their own source files.

>
> * Error messages. Introduce an error into the code and, frankly, the
> resulting verbiage emitted by the compiler is utterly impenetrable.
> This is, of course, true of many complex template libraries in C++,
> and maybe when C++ (eventually) gains concepts, it will improve. But
> it doesn't help with today's language.

Yes. It's the fate of any template library in C++.

> Secondly, it would be nice if there
> were some easy way to keep input and output in sync, if not by having
> a single function that does both (in simple cases, I've seen the %
> operator overloaded to reasonable effect to implement both << and >>),
> then by having similar-looking input and output functions that leave
> it easy to verify by eye their compatibility. Maybe that's something
> I can still build on top of Boost.Spirit, but that sounds a daunting
> dask.

Spirit 2 contains Karma and Qi, one for producing output, the other
for parsing. They use extremely similar syntax specifications.

Sebastian


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]