Beast: Reuse of `parser` with `http::async_read()` gives incorrect results

Created on 18 Apr 2018 · 7Comments · Source: boostorg/beast

Reusing a parser with http::async_read() results in parser.body() accumulating the bodies of all the responses. Calling parser.clear() before again calling http::async_read() does not fix this. The only solution appears to be to create a new parser with every http::async_read() performed.

Unlike the http::read documentation on msg, the http::async_read documentation makes no mention that the parser should not have any previous content.

The http_crawl.cpp example also happily reuses the http::response<http::string_body> res_ member of the worker class for every request. For http_crawl.cpp, this might not be problematic because it ignores the response body, but it creates the wrong expectations in the absence of documentation warning against such use.

Version of Beast

144

Steps necessary to reproduce the problem

Modify http_crawl.cpp to dump res_.body() in worker::on_read(). E.g.:
... auto const code = res_.result_int(); std::cout << "============================" << std::endl; std::cout << res_.body() << std::endl; std::cout << "----------------------------" << std::endl; report_.aggregate( ...

Modify urls_large_data.cpp to fetch only google.com two times:

...
urls_large_data()
{
    static std::vector <char const*> const urls ({
        "google.com",
        "google.com"
    });

    return urls;
}
...

Recompile with bjam

Run with 1 thread: http-crawl 1

============================
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

----------------------------
Progress: 0 of 2
============================
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

----------------------------
Elapsed time:    5 seconds
Crawl report
   Failure counts
       Timer   : 0
       Resolve : 0
       Connect : 0
       Write   : 0
       Read    : 0
       Success : 2
   Status codes
       301: 2 (Moved Permanently)

All relevant compiler information

gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0

Bug

Source

smipi1

All 7 comments

Hi @vinniefalco,

Is this a documentation/example or an implementation bug?

Regards,
@smipi1

smipi1 on 18 Apr 2018

Is this a documentation/example or an implementation bug?

Documentation. The parser was never meant to be reusable. Note that the message class has no "clear" member function.

vinniefalco on 18 Apr 2018

👍1

Is it worthwhile reworking the examples as well to make this clear. Minimally a comment why this generally is I'll advised, but okay for the crawler use case.

smipi1 on 18 Apr 2018

👍1

Actually that might explain why the crawler malfunctions towards the end... good find :)

vinniefalco on 19 Apr 2018

This issue has been open for a while with no activity, has it been resolved?

stale[bot] on 19 May 2018

Hi @vinniefalco,
Should I take a stab at improving the documentation, or would you like this to be accommodated with a fix for the crawler too?

smipi1 on 19 May 2018

The place for the documentation is in basic_parser, since it can be used even without calling async_read , and exhibit the problem. If you want to try your hand at a fix for either of these problems (or both) I certainly won't mind!

vinniefalco on 20 May 2018

Was this page helpful?

0 / 5 - 0 ratings