Rubygems.org: Switch search to Algolia

Created on 13 Feb 2018  Β·  12Comments  Β·  Source: rubygems/rubygems.org

Hi πŸ˜ƒ

I've seen in your roadmap that you are talking about search in the Upcoming Projects section and also in Aditya Prakash blog post.

I'm working at Algolia, a search API, Rubyist at heart and I would love to contribute back to the community by setting up Algolia for RubyGems.org 😍.

By switching to Algolia, you would not have to handle a cluster of ES instances anymore; less server load; search would be faster and we could implement an instant search experience (displaying results as you type, see screenshot below). Tweaking the ranking and relevance of the search indices is also muuuuch more easier and faster to handle.

βŒ› Faster that the current search
🌎 Distributed Search Network
πŸ‘ More relevant
✏️ Typo-tolerant (see below)
β›± No ES setup to maintain and update
πŸ’° No server cost
οΉͺ Strong SLA
πŸ“ˆ Analytics

For example, Algolia powers yarnpkg.com and packagist.org.

This would help fixing issues like these ones:
https://github.com/rubygems/rubygems.org/pull/1315
https://github.com/rubygems/rubygems.org/issues/1610
https://github.com/rubygems/rubygems.org/issues/1662
https://github.com/rubygems/rubygems.org/issues/1670
https://github.com/rubygems/rubygems.org/issues/1651
https://github.com/rubygems/rubygems.org/issues/1256

rubygems

Let me know if that's something you would be interested in!

Edit: Demo available here

feature search

Most helpful comment

sorry, that was not the button i meant to press πŸ™‡

All 12 comments

I can't speak for the rubygems.org developer team since I am just a simple ruby hacker who uses gem and the gem infrastructure to offer (and maintain) some gems; mostly small ones.

Better search functionality in general, not only in regards to speed (or speed-up ... or ease of use), would be, I think, appreciated by many ruby users in the sense that they may want to... well, find something that is useful to them. Find more; find better results. Actually, to me personally, better is a lot more important than speed via an interface per se - I can just let the result stay in a tab and come back later. But I understand everyone who wants things to go faster too, no worries. :)

In the past there was another site that grouped/themed gems together, e. g. "gems that deal with video data" or "gems that deal with rails/web" and so on and so forth.

I had a look at the advanced search subsection at rubygems.org, e. g. at:

https://rubygems.org/search/advanced

(On a side note, perhaps some tooltips that explain the fields, may be helpful there... but this is just an aside.)

There is presently, as far as I know, no way to search for a "theme" or a "group" or something like that. Perhaps that could be added one day.

Anyway, since I have digressed - I think it may help if one of the rubygems.org team and/or code contributor to the code infrastructure there, could comment. The roadmap is a bit unspecific which, I think, may be because it is updated only every once in a while.

Perhaps in regards to Algolia, there could be an experimental API + test (including use via a browser interface) made available so that people could test it extensively before switching? But I guess it again comes down to the rubygems.org developers having to comment on this, so let's poke them with something not too sharp. :D

I like the idea of less maintenance work of hosting search, but I'm not sure that reason alone would be enough to convince the rubygems.org team that it's better. For example, many of the key points you mentioned are already possible with elasticsearch:

Faster that the current search - Benchmarks showing this on rubygems.org would be necessary to be convincing, especially under load.

Distributed Search Network - I'm sure it's easy to scale with Algolia, but ES is distributed by nature too and depending on how the infrastructure is currently setup expanding to new regions shouldn't be very difficult.

More relevant - The search engine itself is only one aspect in the system for searching stuff. Mappings, scoring boosters, analyzers, et. al. all play into the quality of search results. The question here is would it be more worth it restructuring how rubygems.org indexes data for search in Algolia or continuing to improve the structure of data in the existing Elasticsearch setup?

Typo-tolerant (see below) - Elasticsearch has fuzzy matching based on Levenshtein edit distance. Like the first point I think benchmarks and examples would be needed to show which handles typos better.

No ES setup to maintain and update - This is an good point, ES can be a pain to deal with sometimes.

No server cost - Also a good point, but relying on a vendor to provide a closed source product for free, forever is a bit unsettling. What happens if Algolia decides to no longer offer free search for open source projects and rubygems.org is already tightly integrated with them?

Strong SLA - Algolia promises 99.9% uptime with their essential/open source plan. I'm not sure what our current uptime is with Elasticsearch, but maybe @dwradcliffe could chime in on that.

Analytics - Not sure how important this is to the rubygems.org team, but Elasticsearch does have kibana which is pretty trivial to setup and maintain.

As you can see from the number of issues and open PRs here, it can take a while before things are merged or changed on the site. Especially so for big projects like this. I can't speak on behalf of the team, but I'm sure more concrete reasons and data showing Algolia would be better than ES for the long term would help sway their opinion.

Hi @clandry94

Thanks for your detailed and honest feedback πŸ™Œ

Following your questions, I've spin up a quick demo replacing the search with Algolia so that you can experience it and get a better idea of what's possible. Please keep in mind that this is just a prototype done on my free time. We can do much better in terms of relevancy, typo tolerance, facets, etc. But if the demo suits you, we will improve it with your feedbacks and insights.

You can check it out here: RubyGems + Algolia

Benchmarks showing this on rubygems.org would be necessary

Definitely. Do you know how many requests per second on average do you currently handle? What's the worst peak scenario like?

Here is a simple "activerecord" query comparison:

rubygems.org

screen shot 2018-04-05 at 12 26 15 pm

Algolia

screen shot 2018-04-05 at 12 26 50 pm

[...] expanding to new regions shouldn't be very difficult

Indeed, but it requires some setup, maintenance and extra costs to achieve this. Spin up new clusters in different regions all over the world, and a DNS on top of it. To achieve a smooth search-as-you-type autocomplete, you need the infra to handle it as close to the user as possible.

I've enabled it for the demo in US-East, US-West, Europe and Japan in order to achieve avg < 100ms response times (HTTPS request + response + engine computing included).

The search engine itself is only one aspect in the system for searching stuff.

Absolutely. This is something hard to achieve if you want to do it right, no matter the engine.
I've started to replicate as much as possible the ranking for the demo, and I will gladly fine tune it with you for the best relevance.

What happens if Algolia decides to no longer offer free search for open source projects and rubygems.org is already tightly integrated with them?

My first dumb answer would be that you could always fallback to ES. The code change is light (few lines of UI) and the indexing part is similar, you just target different endpoints.

That said, I definitely understand this concern. Yarn, Packagist, React, etc. also raised this issue but I think the benefits outweigh this risk.

I really want to emphasize that we do not offer it as part of a marketing strategy that could change next year. We (the dev team) genuinely want to offer the best we can to the open source projects we love internally and use constantly; It's our way to give back to the community when we can ❀️

Finally, if that's really a big concern, I can see internally if we can sign off a contract that would settle down this issue.

Hey, Tim from Algolia here. I've been working with @raphi on this demo and I might be able to give a bit more context about what we're suggesting here.

Most of the code I write is either Ruby or JavaScript. In both cases I often refer to yarn/rubygems to find a package that would do what I'm trying to achieve. Both languages have strong communities and chances are high someone already built a library that does what I'm looking for.

When I search for gems, I usually search on Google with my keywords. It gives me a few options, I then check the RubyGems and GitHub pages to compare the number of downloads, the number of GitHub stars, the author, have a look at the readme to check the API, and a bunch of other manual steps to assess which of the various libs is the one I should be using. I found it a trial-and-error process and it takes time.

When I search for node modules, I go directly to their website and the experience is smoother. For a start, it's faster, but I agree with @shevegen, this is not the most important. What is interesting is that the displayed results already contain a bunch of interesting data to help me mentally filter my choices (I can see the author, last updated date, number of download, license and I have links to the official website and GitHub page if available). More important, results are ranked based on some of those criterions.

What we're suggesting here is a joint effort to improve the search experience of RubyGems. We are Ruby users and love it. As a company, we would be happy to help in any way we can. We have experience on building search UIs (and specifically for package managers) and would be very happy to use it to improve RubyGems.

Our goal would be to collaborate to build a free Algolia index anyone could use, that contains an up-to-date list of all gems. This is something we did with the Yarn team in the past, and many amazing things happened. Because the index is public and anyone can use it, people started building IDE plugins, online editors and command-line tools that use the index. I'm confident the same things would happen here and the community as a whole would benefit from it.

Now, I'm not trying to push our solution for any commercial purposes. I genuinely believe we can make the developer experience of the website better, by improving the search experience. I also understand the points you raise; changing something as important as the search can seem risky, especially when it seems to be tied to a third party provider. I'll answer your points the best I can:

  • Speed: To give you a comparison point, you can compare with the Yarn website. Results displayed as you type, in a few milliseconds. This is the speed we can reliably deliver. Our system is built in such a way that even under heavy load, search will never be degraded (we will delay indexing if needed, but search is a separate process and will always be up).

  • Relevance and data format: We put a lot of effort into making our relevance formula as easy as possible to understand by humans. You don't need to know about Mappings, scoring boosters, analyzers, et. al. when using Algolia, which makes the maintenance and evolution of the search part much easier for everyone. Typo-tolerances, synonyms and plurals are handled out of the box. We've seen many junior developers able to help improve the display and relevance of their search implementation, something that typically requires ES experts.

  • Third party: This is the point I related the most with, as I would personally not feel comfortable being too tied to a third party either TBH. As I said earlier, the main goal of this PR is to work together in offering a publicly searchable list of gems, that everyone would benefit from, starting with Rubygems. If you need legal reassurance that we'll not suddenly add a paywall, we can definitely create the paperwork for that.

Hope this clarifies our stance, and that we can continue the discussion (I'd be happy to have the POV of some of the core contributors here). Once again, we'd like to offer our help in improving the current search experience of RubyGems and are ready to put company ressources (in the form of developer time, demos or free plans) to make this happen :)

sorry, that was not the button i meant to press πŸ™‡

Hi again! @clandry94 Would you have time providing me advice/feedback on my last answer about the points you raised? If not that's ok, just ensuring we keep the conversation active and you're not awaiting us to do some move that was not said. Thanks!

If I can add my 2 bits from an organisation that uses Algolia: A negative thing about using Algolia in my experience is that there is no local version, so it makes it very difficult to develop offline, I know that we could maintain two endpoints for search but I would rather use my time elsewhere.

Thank you for offering your search technology for RubyGems.org. After some internal discussion, we aren’t comfortable depending on a third-party vendor for our search functionality, so we’re going to stick with ElasticSearch.

Alright, we understand. Let us know if we can help you in the future πŸ‘

@raphi Do you have anything in your pipeline that allows for local versions (a dumbed down version but one that is API compatible) of Algolia? Our main issue with Algolia is for our testing and local environments because they have to talk with your servers.

@kaspergrubbe I think you should ping Algolia support for those kind of questions.

@kaspergrubbe nope we are planning to do that, as our product is a SaaS only API and not on premise software.
For testing purpose, we usually mock requests or simply use the API directly since it's fast. What would be the issue doing so?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jaredbeck picture jaredbeck  Β·  5Comments

lingfennan picture lingfennan  Β·  7Comments

mdesantis picture mdesantis  Β·  5Comments

krithika369 picture krithika369  Β·  8Comments

suriyaa picture suriyaa  Β·  7Comments