Hi people,
The YouCompleteMe is awesome, but I'm facing some strange ranking issues, given the strings on document:
InternalCountingCreator
internal_counting_id
internal_counting
internal_counting_items
increment_existing_counting
If you type intecounting the first result is internal_counting_id and the second is internal_counting_items, the internal_counting suggestion, which I want, come on fifth position below to most unobvious suggestion like increment_existing_counting that come on third position.
Other case:
validate
validates
If you type valid the first suggestion is the validate_associated instead of validate, but think, If I want validate_associated I could type valiass to this, but to get validate on this case there's no other way than navigate to second position.
Why do you people don't use some sort of levenshtein distance algorithm to ranking? I believe that results would be much more accurate eliminating the need of navigate on the suggestions list to get the right suggestion.
YCM match base on all the words ,you should type _
InternalCounting
InternalCountingCreator
internal_counting_id
internal_counting
internal_counting_items
increment_existing_counting
in this case if you want internal_counting
you can type i_c
i think now the ycm's matcher is very good ,if the char you type are the first char of every word,i think the first one is what you want!
i will show you some case
InternalCounting
InternalCountingCreator
internal_counting_id
internal_counting
internal_counting_items
increment_existing_counting
in the same case every word split by _
if you want increment_existing_counting, you can type this there chars,iec which is the begin of these three words.
in this case if you want validates
validate_associated
validate
validates
you can type vds,begin +mid+end of the word you want
if you type vdd,validate_associated will be the first
i_c and vds isn't intuitive because I need to think on what "fuzzy token" I need type (on the second example I need think on the middle of every word :scream:) to get my suggestion on first position and with this fuzzy search lose all meaning.
Levenshtein Distance is great in this case because It works with your examples and don't require any thinking about the token that you are typing, you just need complement with "ahead details" to refine your search and ranking.
any suggestion about how to match?
YCM do not know what you want,it only can show the rusuly based on what you type.
as you know,some result string is very long,YCM should not match the shorter one
for example in this case vde should put the longest in the first position instead of the shorter one.because both of them are v\w*d\w*e
validate_associate
validate
here is the result for the above case
in vim /v\w*d\w*e will match both validate_associateandvalidate
and i think YCM is Greed Matcher,so the longest one will be the first
sorry ,it is my fault ,ycm will put the shorter string in the first position
I'll bring some proof of concept to some that I wish to see on YCM soon. But in general Sublime Text and Atom, for example, do something like I said.


_(in this case, Sublime brought "validates" on first position and I don't know why. Btw validates looks better suited than validate_associated to first position)_
YCM putting the shortest string on top could resolve most of cases, I believe. :+1:
so i think we have similar views,is there any issue about this?
I don't think Levensthein Distance is well suited for this kind of thing. Take the following words:
triage
When typing tiae, you will get triage in first place because:
LD(tiae, this_is_an_example) = 14
LD(tiae, triage) = 2
where LD is the Levensthein Distance.
Word boundaries are far more pertinent criterion for a ranking system. Let's see your examples:
InternalCountingCreator
internal_counting_id
internal_counting
internal_counting_items
increment_existing_counting
If you want internal_counting, you should type ic (or i_c to remove the capitalized suggestion).
validate
validates
You should type ve for validate, vs for validates, and va for validate_associated.
The goal of a completion system is to type the least number of characters to get your completion and a ranking system based on Levensthein Distance is against this idea.
@micbou, I understand your point, we use completion systems on different ways.
I'm always more inclined to don't care to word boundaries because I need think about these limits when I'm typing, when I use some like Sublime-like completion I don't need care about these boundaries, I just need type any fragments in right order.
Btw, I'm leaving my suggestion, but If it goes much against YCM philosophy you can close this issue.
@wsdjeg I think I misunderstood you, I thought that you'd submit some patch to YCM put the shortest on top.
One question. In the example below, should bebring the billable term on first position, since it has b on beginning and e on ending? (I'm using ruby as buffer syntax)

By looking at the code, YCM seems to only consider the first (and not the last) character of word boundaries for suggestions ranking. @Valloric should confirm this.
IIRC ycmd weights suggestions higher when the query is a prefix of the candidate, after checking word boundary chars. In this case 'be' is a prefix of 'between' so it beats 'billable'
Understood. The previous example didn't show the problem properly.
With this on file:
billable_item
billables
billable
and wanting get billable on the first position.
bae was getting billable on the first position. When because word entered on file, bae start bringing because on first position. So I needed to find another matching token, this time bbl does the work, but until when?
These are tiny examples of what I'm getting many times every day. :/
So, I'm toying with the idea of always considering the last char of the identifier a word-boundary char. I think this works quite well in these situations, where you're sort of after the "shorter" word with equivalent subsequence match.
Quick demo:

I have pushed a simple change to a ycmd branch which you can try out. Let me know feedback, etc.
Commit: https://github.com/puremourning/ycmd-1/commit/11e5308ba7c4800bf649f0e58a3d52f849143b04
https://github.com/puremourning/ycmd-1/tree/last-char-wb
Interested in thoughts from @Valloric, @micbou, @vheon, @oblitum on this before investing too much time.

For the other case at the top
BTW i know that the patch above is garbage (it can potentially add the last char multiple times, needs tests, breaks tests, etc.), but it's just a prototype to see if it feels right.
Word boundaries considering the first and last letter could improve the things, but we'll continue have the problem described on https://github.com/Valloric/YouCompleteMe/issues/1757#issuecomment-156714617 :confused:
Personally, I find word boundary chars way more intuitive than the proposed alternatives, particularly for the use cases YCM is designed for. And the change would be breaking for everyone's mental model of how YCM offers suggestions, so I would be surprised if we changed more than just tweaking the current approach.
may be a new branch should be tried by everyone
i think now the ycm works well ,also no need to change
@puremourning I don't think that trying to consider the last word char a word boundary char is a good idea with the current ranking system. What we have right now, while not IMO perfect, is an understandable and well-functioning mechanism with a simple guide for the user: the best way to rank something to the top is to write the first char of each word.
The real improvement would be rewriting the matching algorithm to be factor-multiplicative instead of tree-based as it is now. Basically something similar to the math behind linear regression. A rewrite of the matching algorithm so that this can be done and we can easily extend it with new factors has been on my mind for more than a year, but I haven't been able to get around to it.
WRT Levenshtein distance, it plain doesn't work when you have _many_ completion candidates. The net it casts is too wide and you get shitty suggestions. I know because it was the first matching algo used in YCM, even before it was released. It was terrible and I quickly replaced it.
while not IMO perfect, is an understandable and well-functioning mechanism with a simple guide for the user: the best way to rank something to the top is to write the first char of each word.
@Valloric got it, the main pain IMHO is the rank not bringing the most simple completion on the first position, making super tricky do some completions (like I said on https://github.com/Valloric/YouCompleteMe/issues/1757#issuecomment-156714617).
About Levenshtein distance, in fact, it was a naive suggestion, sorry.
I also think just put the shortest on the top is a good idea. Because even if it's not the correct word, I can just select it and add some new letters, then it's done. In another case, like in previous example,
InternalCounting
InternalCountingCreator
internal_counting_id
internal_counting
internal_counting_items
increment_existing_counting
we can just put the shortest two at front,
internal_counting
InternalCounting
Even if what we want was "InternalCountingCreator", we only need to hit 'tab' 2 times, and add another character 'C', then complete it again.
if you want InternalCountingCreator you should just type icc and then <Tab>
Changes to the ranking algorithm are unlikely as they break the mental model for our users.
Most helpful comment
I also think just put the shortest on the top is a good idea. Because even if it's not the correct word, I can just select it and add some new letters, then it's done. In another case, like in previous example,
we can just put the shortest two at front,
internal_counting
InternalCounting
Even if what we want was "InternalCountingCreator", we only need to hit 'tab' 2 times, and add another character 'C', then complete it again.