|
Prev: How to impliment searching the definition of a c/c++ function in a c/c++ source file
Next: Questions about default constructed STL container elements
From: Richard Smith on 18 Jun 2008 10:28 I use C++ standard IOStreams fairly extensively, but, while I use quite a lot of the functionality for formatted output, when it comes to input, I usually stick to unformatted input (e.g. with get / peek / ignore and friends) or std::getline. I rarely find myself using or writing custom operator>> on user-defined types. And I'm wondering whether I'm missing out on something here. To make it concrete, let me give a fairly simple example where I recently found myself wanting to write a custom operator>>. Consider a simple class representing an HTTP request line. (I've left out all of the stuff that is not germane to this discussion: I probably wouldn't have given it public data members in reality.) struct request_line { std::string method, uri, version; }; std::ostream& operator<<( std::ostream& os, request_line const& rl ) { os << rl.method << ' ' << rl.uri; if ( not rl.version.empty() ) os << ' ' << rl.version; return os << "\r\n"; } Each of the strings method, uri and version are guaranteed not to contain any white-space, and method and uri are guaranteed not to be empty. Some examples of serialised output are: "POST /form.cgi HTTP/1.1\r\n" // all three components present "GET /index.html\r\n" // version omitted So writing an operator>> really ought to be simple exercise using operator>>( std::istream&, std::string& ), as that already reads in one white-space delimited token. And the solution is nearly, but not quite, to do: std::istream& operator<<( std::istream& in, request_line& rl ) { return in >> rl.method >> rl.uri >> rl.version >> std::ws; } The complication arises from three facts: 1. All three components must be on the same line. "GET\r\n/index.html \r\n" should produce an error. 2. The version component is optional: if I reach the end of the line, without having four scanned in rl.version, that's fine. (The same is not true of the uri.) 3. I want to be strict about white-space checking: I want there to be precisely one space character (not a tab, or any other sort of white- space) between tokens, and none at the beginning of end of the line. If I were writing perl, this would be easy: I would write something like: ($method, $uri, $version) = /^([^\s]+) ([^\s]+)(?: ([^\s]+))?\r\n$/; But in C++ these all add complexity. And ordinarily it is at this point that I stop using formatted I/O and call std::getline and the std::string::find_first* functions. (Or perhaps Boost.Spirit or Boost.Regex in hairier examples.) But I'm hoping someone can suggest how to do this elegantly with std::istream. -- Richard Smith [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Vidar Hasfjord on 18 Jun 2008 17:22 On Jun 19, 2:28 am, Richard Smith <rich...(a)ex-parrot.com> wrote: > [...] Consider a simple class representing an HTTP request line. > [...] Some examples of serialised output are: > > "POST /form.cgi HTTP/1.1\r\n" // all three components present > "GET /index.html\r\n" // version omitted > [...] > 1. All three components must be on the same line. "GET\r\n/index.html > \r\n" should produce an error. > 2. The version component is optional: if I reach the end of the line, > without having four scanned in rl.version, that's fine. (The same is > not true of the uri.) > 3. I want to be strict about white-space checking: I want there to be > precisely one space character (not a tab, or any other sort of white- > space) between tokens, and none at the beginning of end of the line. > > If I were writing perl, this would be easy: I would write something > like: > > ($method, $uri, $version) = /^([^\s]+) ([^\s]+)(?: ([^\s]+))?\r\n$/; > > But in C++ these all add complexity. And ordinarily it is at this > point that I stop using formatted I/O and call std::getline and the > std::string::find_first* functions. (Or perhaps Boost.Spirit or > Boost.Regex in hairier examples.) The tr1::regex (Boost.Regex) library is ideally suited for this problem. To parse the strings in this example you already need many features of a parsing library. > But I'm hoping someone can suggest how to do this elegantly with std::istream. While I recommend tr1::regex for this problem, here's an istream solution expressing the parsing logic using ordinary C++ control flow: istream& operator >> (istream& is, request_line& rl) { is >> resetiosflags (ios::skipws) >> rl.method >> accept (' ') >> rl.uri; if (is.peek () == ' ') is >> accept (' ') >> rl.version; return is >> accept ('\r') >> accept ('\n'); } The "accept" manipulator used here should only allow and consume the given character and otherwise set the failbit of the stream. I'll leave the implementation as an exercise. Regards, Vidar Hasfjord -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Alberto Ganesh Barbati on 19 Jun 2008 07:06
Richard Smith ha scritto: > > "POST /form.cgi HTTP/1.1\r\n" // all three components present > "GET /index.html\r\n" // version omitted > > So writing an operator>> really ought to be simple exercise using > operator>>( std::istream&, std::string& ), as that already reads in > one white-space delimited token. And the solution is nearly, but not > quite, to do: > > std::istream& operator<<( std::istream& in, request_line& rl ) { > return in >> rl.method >> rl.uri >> rl.version >> std::ws; > } > > The complication arises from three facts: > > 1. All three components must be on the same line. "GET\r\n/index.html > \r\n" should produce an error. > 2. The version component is optional: if I reach the end of the line, > without having four scanned in rl.version, that's fine. (The same is > not true of the uri.) > 3. I want to be strict about white-space checking: I want there to be > precisely one space character (not a tab, or any other sort of white- > space) between tokens, and none at the beginning of end of the line. > > If I were writing perl, this would be easy: I would write something > like: > > ($method, $uri, $version) = /^([^\s]+) ([^\s]+)(?: ([^\s]+))?\r\n$/; > > But in C++ these all add complexity. And ordinarily it is at this > point that I stop using formatted I/O and call std::getline and the > std::string::find_first* functions. (Or perhaps Boost.Spirit or > Boost.Regex in hairier examples.) But I'm hoping someone can suggest > how to do this elegantly with std::istream. > I would go with the regex solution ;) It's much more powerful and it's less likely that you miss some corner case you might want to check. Anyway, since you asked, here's a solution with iostreams only: ---------------- std::istream& extract_space(std::istream& is) { if (is.peek() != ' ') is.setstate(std::ios_base::failbit); else is.ignore(); return is; } std::istream& assert_non_ws(std::istream& is) { if (std::isspace(is.peek())) is.setstate(std::ios_base::failbit); return is; } std::istream& operator<<(std::istream& is, request_line& rl) { std::string s; if(std::getline(is, s)) { std::istringstream line(s); std::string method, uri, version; line >> assert_non_ws >> method >> extract_space >> assert_non_ws >> uri >> extract_space >> assert_non_ws >> version; if (!method.empty() && !uri.empty() && line.eof()) { rl.method = method; rl.uri = uri; rl.version = version; } else { is.setstate(std::ios_base::failbit); } } return is; } ---------------- As you can see, it's a lot more work than one might expect, even for such a simple parse like that. But maybe it's because it's not actually that simple, isn't it? ;) With a little more effort you may also get rid of the intermediate istringstream. This is left as an exercise for the reader (hint: you need an extractor that may fail in two different ways). HTH, Ganesh -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |