Stl: <functional>: boyer_moore_searcher produces incorrect results

Created on 13 Apr 2020  Â·  8Comments  Â·  Source: microsoft/STL

Originally reported as DevCom-987357 and Microsoft-internal VSO-1101729:

C:\Temp>type meow.cpp
#include <algorithm>
#include <functional>
#include <iostream>
#include <string_view>
using namespace std;

int main() {
    const string_view text{
        "fbdhhihagdjcdibfdfdgbbhjcdifffdjdaighiaaaehigjegec"
        "jffcaecagcbiaeadhebggbijfdeihiceajbcjcjghhbjfcebge"};
    const string_view pattern{"aaa"};

    cout << "These should be identical:" << endl;

    {
        const default_searcher ds(pattern.begin(), pattern.end());

        const auto it_ds = search(text.begin(), text.end(), ds);

        if (it_ds == text.end()) {
            cout << "Not found" << endl;
        } else {
            cout << "Offset is " << it_ds - text.begin() << endl;
        }
    }

    {
        const boyer_moore_searcher bms(pattern.begin(), pattern.end());

        const auto it_bms = search(text.begin(), text.end(), bms);

        if (it_bms == text.end()) {
            cout << "Not found" << endl;
        } else {
            cout << "Offset is " << it_bms - text.begin() << endl;
        }
    }

    {
        const boyer_moore_horspool_searcher bmhs(pattern.begin(), pattern.end());

        const auto it_bmhs = search(text.begin(), text.end(), bmhs);

        if (it_bmhs == text.end()) {
            cout << "Not found" << endl;
        } else {
            cout << "Offset is " << it_bmhs - text.begin() << endl;
        }
    }
}

C:\Temp>cl /EHsc /nologo /W4 /MTd /std:c++17 meow.cpp
meow.cpp

C:\Temp>meow
These should be identical:
Offset is 38
Not found
Offset is 38
bug fixed high priority

Most helpful comment

Every time someone complains that we haven't fixed a bug they reported MONTHS ago, I will link this bug that took 40 years to fix.

All 8 comments

We implemented an algorithm published in 1977 which was incorrect! It turns out that we need the "Rytter correction" published in 1980.

If time permits I would really like to have a look here.

Am Di., 14. Apr. 2020 um 06:19 Uhr schrieb Stephan T. Lavavej <
[email protected]>:

We implemented an algorithm published in 1977 which was incorrect! It
turns out that we need the "Rytter correction" published in 1980.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/microsoft/STL/issues/713#issuecomment-613217356, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AAXATIHZ2SQKBA5MLU7YAA3RMPP4XANCNFSM4MHIIEMQ
.

I’m working on implementing and then testing the fix, but review would be greatly appreciated 😸

https://epubs.siam.org/doi/10.1137/0209037 is the "Rytter correction" paper. I actually still have access to this from my Alma Mater case.edu O_O

@StephanTLavavej the right thing to do is probably go get MSFT to request a copy of that, I think https://microsoft.sharepoint.com/sites/mslibrary/ContactUs/Pages/JournalRequestForm.aspx?WT.mc_ID=MSL_MegaMenu is the right form but you've done more of this 'company buys stuff' than I have.

"Algorithms For String Searching: A Survey" by Baeza-Yates (1989) also contains the Rytter correction, so I plan to implement that unless it's deficient in some way. (I have never actually done that thing, by the way.)

@miscco The PR is out if you want to take a look!

Every time someone complains that we haven't fixed a bug they reported MONTHS ago, I will link this bug that took 40 years to fix.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

PhilipDeegan picture PhilipDeegan  Â·  13Comments

malkia picture malkia  Â·  10Comments

StephanTLavavej picture StephanTLavavej  Â·  12Comments

cbezault picture cbezault  Â·  15Comments

StephanTLavavej picture StephanTLavavej  Â·  10Comments