From: Richard Smith on
I use C++ standard IOStreams fairly extensively, but, while I use
quite a lot of the functionality for formatted output, when it comes
to input, I usually stick to unformatted input (e.g. with get / peek /
ignore and friends) or std::getline. I rarely find myself using or
writing custom operator>> on user-defined types. And I'm wondering
whether I'm missing out on something here.

To make it concrete, let me give a fairly simple example where I
recently found myself wanting to write a custom operator>>. Consider
a simple class representing an HTTP request line. (I've left out all
of the stuff that is not germane to this discussion: I probably
wouldn't have given it public data members in reality.)

struct request_line {
std::string method, uri, version;
};

std::ostream& operator<<( std::ostream& os, request_line const& rl )
{
os << rl.method << ' ' << rl.uri;
if ( not rl.version.empty() ) os << ' ' << rl.version;
return os << "\r\n";
}

Each of the strings method, uri and version are guaranteed not to
contain any white-space, and method and uri are guaranteed not to be
empty. Some examples of serialised output are:

"POST /form.cgi HTTP/1.1\r\n" // all three components present
"GET /index.html\r\n" // version omitted

So writing an operator>> really ought to be simple exercise using
operator>>( std::istream&, std::string& ), as that already reads in
one white-space delimited token. And the solution is nearly, but not
quite, to do:

std::istream& operator<<( std::istream& in, request_line& rl ) {
return in >> rl.method >> rl.uri >> rl.version >> std::ws;
}

The complication arises from three facts:

1. All three components must be on the same line. "GET\r\n/index.html
\r\n" should produce an error.
2. The version component is optional: if I reach the end of the line,
without having four scanned in rl.version, that's fine. (The same is
not true of the uri.)
3. I want to be strict about white-space checking: I want there to be
precisely one space character (not a tab, or any other sort of white-
space) between tokens, and none at the beginning of end of the line.

If I were writing perl, this would be easy: I would write something
like:

($method, $uri, $version) = /^([^\s]+) ([^\s]+)(?: ([^\s]+))?\r\n$/;

But in C++ these all add complexity. And ordinarily it is at this
point that I stop using formatted I/O and call std::getline and the
std::string::find_first* functions. (Or perhaps Boost.Spirit or
Boost.Regex in hairier examples.) But I'm hoping someone can suggest
how to do this elegantly with std::istream.

--
Richard Smith

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Vidar Hasfjord on
On Jun 19, 2:28 am, Richard Smith <rich...(a)ex-parrot.com> wrote:
> [...] Consider a simple class representing an HTTP request line.
> [...] Some examples of serialised output are:
>
> "POST /form.cgi HTTP/1.1\r\n" // all three components present
> "GET /index.html\r\n" // version omitted
> [...]
> 1. All three components must be on the same line. "GET\r\n/index.html
> \r\n" should produce an error.
> 2. The version component is optional: if I reach the end of the line,
> without having four scanned in rl.version, that's fine. (The same is
> not true of the uri.)
> 3. I want to be strict about white-space checking: I want there to be
> precisely one space character (not a tab, or any other sort of white-
> space) between tokens, and none at the beginning of end of the line.
>
> If I were writing perl, this would be easy: I would write something
> like:
>
> ($method, $uri, $version) = /^([^\s]+) ([^\s]+)(?: ([^\s]+))?\r\n$/;
>
> But in C++ these all add complexity. And ordinarily it is at this
> point that I stop using formatted I/O and call std::getline and the
> std::string::find_first* functions. (Or perhaps Boost.Spirit or
> Boost.Regex in hairier examples.)

The tr1::regex (Boost.Regex) library is ideally suited for this
problem. To parse the strings in this example you already need many
features of a parsing library.

> But I'm hoping someone can suggest how to do this elegantly with std::istream.

While I recommend tr1::regex for this problem, here's an istream
solution expressing the parsing logic using ordinary C++ control flow:

istream& operator >> (istream& is, request_line& rl) {
is >> resetiosflags (ios::skipws)
>> rl.method >> accept (' ') >> rl.uri;
if (is.peek () == ' ')
is >> accept (' ') >> rl.version;
return is >> accept ('\r') >> accept ('\n');
}

The "accept" manipulator used here should only allow and consume the
given character and otherwise set the failbit of the stream. I'll
leave the implementation as an exercise.

Regards,
Vidar Hasfjord


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Alberto Ganesh Barbati on
Richard Smith ha scritto:
>
> "POST /form.cgi HTTP/1.1\r\n" // all three components present
> "GET /index.html\r\n" // version omitted
>
> So writing an operator>> really ought to be simple exercise using
> operator>>( std::istream&, std::string& ), as that already reads in
> one white-space delimited token. And the solution is nearly, but not
> quite, to do:
>
> std::istream& operator<<( std::istream& in, request_line& rl ) {
> return in >> rl.method >> rl.uri >> rl.version >> std::ws;
> }
>
> The complication arises from three facts:
>
> 1. All three components must be on the same line. "GET\r\n/index.html
> \r\n" should produce an error.
> 2. The version component is optional: if I reach the end of the line,
> without having four scanned in rl.version, that's fine. (The same is
> not true of the uri.)
> 3. I want to be strict about white-space checking: I want there to be
> precisely one space character (not a tab, or any other sort of white-
> space) between tokens, and none at the beginning of end of the line.
>
> If I were writing perl, this would be easy: I would write something
> like:
>
> ($method, $uri, $version) = /^([^\s]+) ([^\s]+)(?: ([^\s]+))?\r\n$/;
>
> But in C++ these all add complexity. And ordinarily it is at this
> point that I stop using formatted I/O and call std::getline and the
> std::string::find_first* functions. (Or perhaps Boost.Spirit or
> Boost.Regex in hairier examples.) But I'm hoping someone can suggest
> how to do this elegantly with std::istream.
>

I would go with the regex solution ;) It's much more powerful and it's
less likely that you miss some corner case you might want to check.

Anyway, since you asked, here's a solution with iostreams only:

----------------
std::istream& extract_space(std::istream& is)
{
if (is.peek() != ' ')
is.setstate(std::ios_base::failbit);
else
is.ignore();

return is;
}

std::istream& assert_non_ws(std::istream& is)
{
if (std::isspace(is.peek()))
is.setstate(std::ios_base::failbit);
return is;
}

std::istream& operator<<(std::istream& is, request_line& rl)
{
std::string s;
if(std::getline(is, s))
{
std::istringstream line(s);
std::string method, uri, version;
line >> assert_non_ws >> method
>> extract_space >> assert_non_ws >> uri
>> extract_space >> assert_non_ws >> version;

if (!method.empty() && !uri.empty() && line.eof())
{
rl.method = method;
rl.uri = uri;
rl.version = version;
}
else
{
is.setstate(std::ios_base::failbit);
}
}
return is;
}
----------------

As you can see, it's a lot more work than one might expect, even for
such a simple parse like that. But maybe it's because it's not actually
that simple, isn't it? ;)

With a little more effort you may also get rid of the intermediate
istringstream. This is left as an exercise for the reader (hint: you
need an extractor that may fail in two different ways).

HTH,

Ganesh

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]