Currently we limit to only matching on the 100 most recent comments.
The request (see below) is really to have individual comments be the records that are returned as search results instead of treating a comment as a part of the post. This will require some expanded version of https://github.com/Automattic/jetpack/issues/13927 to be implemented and some controls for the end user to be able to select what is being searched against by default.
This also raises questions about how other sites would feel about how much this would increase their record count and thus their prices. Which probably means we would also need to implement https://github.com/Automattic/jetpack/issues/13295 for it to be palatable.
As an interim solution we could raise the 100 recent comment threshold, but that seems unlikely to actually help here.
Thanks Greg. The use case we pinged you about is for https://www.techdirt.com/ where, like Reddit for example, the comments are "the content" as much as the content is "the content".
They have 1.7M comments (in total) across 72k posts...it's one of the most comment heavy sites I've seen.
Perhaps @vishalkakadiya could pull some stats (a distribution) that gives some sense of number of comments per post at the Nth percentile or something. I know it's over 100, but not sure what the upper boundaries look like.
cc @keoshi @scottsweb fun!
Perhaps @vishalkakadiya could pull some stats (a distribution) that gives some sense of number of comments per post at the Nth percentile or something. I know it's over 100, but not sure what the upper boundaries look like.
Sharing comment count of top 15 most commented posts here, these numbers are based on DB dump we get on Nov 2019, so those numbers must be increased in the last 4 and half months.
```
1897
1723
1463
823
772
717
624
621
605
589
555
514
502
501
473
````
@vishalkakadiya since we really aren't going to be able to solve this issue very quickly I've increased the number of comments indexed per post to 200. Hopefully that at least helps with the UX a bit and it should improve the ranking overall.
"increase their record count and thus their prices. "
I'd agree we want to be really careful with this. This also opens a pretty big can of worms about showing different types of results in the list. (Are the comments co-mingled at the same level as their parent item etc? ) I think this should be addressed as a full design project at some point, not a just git hub issue.
Thoughts @folletto ?
Note, we received related feedback from the Techdirt team...they feel that co-mingling the comments with the posts in the search backend while only surfacing the posts on the frontend leads to search results which don't seem as relevant (in their perspective).
Their current search does comments and posts separately, so it may be partially a case of "what is known feels better."
But, in the long run it seems that giving users discrete options in Jetpack Search (comments, posts, comments & posts) is probably the best way to go.
My first reaction would be to suggest a flag that allows to "Index Individual Comments". If turned on, the number of records in the index grows, and so the plan cost. If off, we keep indexing 100 items.
This would allow sites that have a special requirement for comments to work, with a clear marker for the price difference (we can show the number of comments and price increase to match), while keeping things simple overall.

But: as you said Yvonne, it's also probably worth thinking this out more extensively in a new project.
Ya, if we start adding a list of what the user has in their index that will possibly explode quickly into making things more complicated. CPTs, tags/categories, users, etc. There is also some design overlap here between what we index vs what gets filtered out of the query: https://github.com/Automattic/jetpack/pull/16072
This issue has been marked as stale. This happened because:
No further action is needed. But it's worth checking if this ticket has clear reproduction steps and it is still reproducible. Feel free to close this issue if you think it's not valid anymore — if you do, please add a brief explanation.
Most helpful comment
cc @keoshi @scottsweb fun!