Beast: Reuse of `parser` with `http::async_read()` gives incorrect results

Created on 18 Apr 2018  路  7Comments  路  Source: boostorg/beast

Reusing a parser with http::async_read() results in parser.body() accumulating the bodies of all the responses. Calling parser.clear() before again calling http::async_read() does not fix this. The only solution appears to be to create a new parser with every http::async_read() performed.

Unlike the http::read documentation on msg, the http::async_read documentation makes no mention that the parser should not have any previous content.

The http_crawl.cpp example also happily reuses the http::response<http::string_body> res_ member of the worker class for every request. For http_crawl.cpp, this might not be problematic because it ignores the response body, but it creates the wrong expectations in the absence of documentation warning against such use.

Version of Beast

144

Steps necessary to reproduce the problem

  1. Modify http_crawl.cpp to dump res_.body() in worker::on_read(). E.g.:
    ... auto const code = res_.result_int(); std::cout << "============================" << std::endl; std::cout << res_.body() << std::endl; std::cout << "----------------------------" << std::endl; report_.aggregate( ...
  2. Modify urls_large_data.cpp to fetch only google.com two times:

    ...
    urls_large_data()
    {
        static std::vector <char const*> const urls ({
            "google.com",
            "google.com"
        });
    
        return urls;
    }
    ...
    
  3. Recompile with bjam
  4. Run with 1 thread: http-crawl 1

    ============================
    <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
    <TITLE>301 Moved</TITLE></HEAD><BODY>
    <H1>301 Moved</H1>
    The document has moved
    <A HREF="http://www.google.com/">here</A>.
    </BODY></HTML>
    
    ----------------------------
    Progress: 0 of 2
    ============================
    <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
    <TITLE>301 Moved</TITLE></HEAD><BODY>
    <H1>301 Moved</H1>
    The document has moved
    <A HREF="http://www.google.com/">here</A>.
    </BODY></HTML>
    <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
    <TITLE>301 Moved</TITLE></HEAD><BODY>
    <H1>301 Moved</H1>
    The document has moved
    <A HREF="http://www.google.com/">here</A>.
    </BODY></HTML>
    
    ----------------------------
    Elapsed time:    5 seconds
    Crawl report
       Failure counts
           Timer   : 0
           Resolve : 0
           Connect : 0
           Write   : 0
           Read    : 0
           Success : 2
       Status codes
           301: 2 (Moved Permanently)
    

All relevant compiler information

gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0

Bug

All 7 comments

Hi @vinniefalco,

Is this a documentation/example or an implementation bug?

Regards,
@smipi1

Is this a documentation/example or an implementation bug?

Documentation. The parser was never meant to be reusable. Note that the message class has no "clear" member function.

Is it worthwhile reworking the examples as well to make this clear. Minimally a comment why this generally is I'll advised, but okay for the crawler use case.

Actually that might explain why the crawler malfunctions towards the end... good find :)

This issue has been open for a while with no activity, has it been resolved?

Hi @vinniefalco,
Should I take a stab at improving the documentation, or would you like this to be accommodated with a fix for the crawler too?

The place for the documentation is in basic_parser, since it can be used even without calling async_read , and exhibit the problem. If you want to try your hand at a fix for either of these problems (or both) I certainly won't mind!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nguoithichkhampha picture nguoithichkhampha  路  7Comments

vinniefalco picture vinniefalco  路  4Comments

JunielKatarn picture JunielKatarn  路  5Comments

chrgrd picture chrgrd  路  4Comments

jed1 picture jed1  路  4Comments