Readthedocs.org: Search returns a 404 link

Created on 6 May 2019  路  14Comments  路  Source: readthedocs/readthedocs.org

Details

Expected Result

Only valid entires show up in the search index. When I wipe a build, all extra search related utilities should be cleared as well.

Actual Result

When you visit the URL https://nanogui.readthedocs.io/en/latest/search.html?q=ButtonGroup&check_keywords=yes&area=default that came from searching for ButtonGroup in the top left of https://nanogui.readthedocs.io/en/latest/ currently you get three links

The tool that generates these docs used to create the now invalid url (class_nanogui__Button.html), but now it generates the third link (classnanogui_1_1Button.html). But I can't seem to get it out of the search results by wiping the build.

For reference, if you download the sphinx searchindex is correct:

$ wget https://nanogui.readthedocs.io/en/latest/searchindex.js
$ cat searchindex.js | sed 's/,/,\n/g' | grep api/ | grep -i button | sort
"api/classnanogui_1_1Button",
"api/classnanogui_1_1Button.rst",
...

The main issue here is that it seems that when I wipe my project and trigger a new build, the sphinx searchindex.js docnames gets updated as expected, but whatever extra voodoo search magic RTD is doing is somehow hanging onto the cached entries that are no longer valid.

This seems potentially related to and or maybe caused by what affects #4452 , but this is true for all users (not related to login credentials, same results in private browser). I don't understand enough about how the extra search stuff works to offer any real insight, this may be a hard one to solve if it's not as easy as "oh yeah delete _this_ too when wipe is issued".

Bug replication

Most helpful comment

I'll try to investigate this next week

All 14 comments

RTD uses elastic searh instead of the default sphinx search. Looks like that for some reason that file isn't getting deleted in the search index.

I changed the title to show the exact problem. Wiping should not affect search results. Once a new build is triggered and successfully built, the search index should be updated accordingly.

I just verified that those 3 results comes from our own search API endpoint: https://readthedocs.org/api/v2/docsearch/?q=ButtonGroup&project=nanogui&version=latest&language=en

I know that we have had some issues with timeout on ES recently when updating the indexes, so maybe re-triggering a build could solve this problem.

I retriggered it after the support ticket in ES, but it didn't work :/

Yeah. I'm still seeing problems on Sentry around timeout connection to ES.

I have similar problem. I reorganized top-level index of my page tree and now when I do searches, I get duplicate hits; one for correct page and one for incorrect (404) page.

To duplicate...

  1. go here
  2. Search for cmfe
  3. You should see results depicted in attached picture below
  4. Note first and third entries are duplicates
  5. First of the duplicates goes to valid page
  6. Second of the duplicates goes to a 404 page

Wiping and making inactive all older versions had no effect.

Can you advise?

Untitled

I'll try to investigate this next week

As a workaround, could users disable elastic search somehow and just keep the Sphinx search tools?

I'd even be ok with this if I had confidence that after some period of time, the old/invalid pages would no longer appear in searches because they staled. I don't think that is happening though because we've been experiencing this behavior for more than 30 days.

So, I've re-built my docs in hopes that would have the intended effect associated with this comment in #5798

Recreate and reindex search objects on each build, so we are safe from weird edge cases.

And, I am still getting same behavior as described above.

The correct (newer) URLs for valid RTD pages are...

https://visit-sphinx-github-user-manual.readthedocs.io/en/develop/gui_manual/Quantitative/DataLevelComparisonsWizard.html?highlight=cmfe

Note the /gui_manual part of the path.

The incorrect (older) URLs for 404 RTD pages are...

https://visit-sphinx-github-user-manual.readthedocs.io/en/develop/Quantitative/DataLevelComparisonsWizard.html?highlight=cmfe

which do not contain the gui_manual part of the path.

This is somehow related to a top-level reorganization of our documentation page tree where the entire older manual was pushed down one-level in the page tree to a dir named gui_manual and newer major sections of the manual were added.

Do I need to go re-build all the other branch/versions (which I wiped and made inactive in hopes of fixing this problem) I could have possibly created es indexing data in to get the old es indexing to be removed?

@markcmiller86 the pull request was merged, but it hasn't been deployed. Hope we can deploy the new version next week.

Oh, of course. What was I thinking? Thanks so much for the quick response!!

@markcmiller86 we just did a new release today, you can clear the search index by triggering a new build to the version with the incorrect results, let us know if this fixes your problem.

Yes, it is fixed!! Thanks so much

Let me try that again, @stsewd ... yes it is fixed.

Was this page helpful?
0 / 5 - 0 ratings