Lsp-mode: [Discussion] project root detection logic

Created on 21 Feb 2018 · 22Comments · Source: emacs-lsp/lsp-mode

If a project has directories that are themselves projects (e.g. submodules) lsp-make-traverser may not work well to mark the project root (:rootUri).

Further, currently lsp-python uses __init__.py to mark the root of a project, which is not reliable. __init__.py may exist in subpackages while the user wants to treat the project as a whole.

(lsp-define-stdio-client lsp-python "python"
             (lsp-make-traverser #'(lambda (dir)
                         (directory-files
                          dir
                          nil
                          "\\(__init__\\|setup\\)\\.py")))
             '("pyls"))

projectile projectile-project-root-files-bottom-up or Emacs 25 project.el provides good project root detection, which can be used by lsp-mode clients. However, for projects with subprojects, the files within subprojects should be associated with the root project.

This creates different views of "project" from the perspective of projectile/project.el and from that of lsp-mode clients. We can differ from projectile/project.el by providing some sophisticated and extensible root matchers. emacs-cquery currently uses the following snippet to provide the :get-root function, which I'd like you to discuss whether it is general enough and can be adopted by other lsp-mode clients.

(defcustom cquery-projects '("~/Dev/llvm-project" "~/Dev/llvm") ".")

(cl-defun cquery-project-roots-matcher ()
  (cl-loop for root in cquery-project-roots do
           (when (string-prefix-p (expand-file-name root) buffer-file-name)
             (cl-return-from cquery--get-root root))))

(defcustom cquery-project-root-matchers
  '(cquery-project-roots-matcher projectile-project-root "compile_commands.json" ".cquery")
  "List of matchers that are used to locate the cquery project roots.
Each matcher is run in order, and the first successful (non-nil) matcher
determines the project root.

A `string' entry defines a dominating file that exists in either the
current working directory or a parent directory. cquery will traverse
upwards through the project directory structure and return the first
matching file.

A `function' entry define a callable function that either returns workspace's
root location or `nil' if another matcher should be used instead.
"
  :type '(repeat (choice (file) (function)))
  :group 'cquery)

(cl-defun cquery--get-root ()
  "Return the root directory of a cquery project."
  (cl-loop for matcher in cquery-project-root-matchers do
           (-when-let (root (cl-typecase matcher
                              (string (cquery--root-from-file matcher))
                              (function  (cquery--root-from-func matcher))))
             (cl-return-from cquery--get-root root)))
  (user-error "Could not find cquery project root"))

If this looks good, we may have macros like lsp-define-whitelist-add to definie client-specific root matchers for users to customize. And i'd like to change the cuery-* variables to use that lsp-cquery-* facility.

discussion

Source

MaskRay

Most helpful comment

@MaskRay

There are scenarios where workspaceFolders are useful and users should make decisions interactively. But the conceptual burden and complexity might make it only exposed to power users.

I tend to disagree with this statement with the following point: Adding folders interactively is the default behaviour in the mainstream editors (VSCode/Eclipse Che and so on) which do target average user. Note also, that interactive root selection is a must for multiroot enabled servers, not something that only power user will want. On the other hand if automatic root discovery fails it would be very frustrating for anyone that does not know elisp.

On the other side here are the extension activation events of VSCode(I have marked in bold the one that make sense for Emacs.)

onLanguage:${language}
onCommand:${command}
onDebug
workspaceContains:${toplevelfilename}
onFileSystem:${scheme}
onView:${viewId}
onUri

Here it is the flow that I am suggesting, note that it does not increase the complexity of using the extension:

On startup lsp-mode loads the set of workspace which could be plain list workspace name -> worspace folders. It could be stored in common location (e. g. ~/.emacs.d/.lsp-workspaces) or in dir-locals.el, etc. The workspace persistence could be implemented in various ways, this is only one of the options.
- when user opens file/directory which matches lsp extension activation point (see above) lsp-mode will do:
- when folder is part of existing workspace lsp-mode will start the corresponding server and make the opened workspace the active one:
  - when server is multi-root enabled lsp-mode will initialize the server with the folders that are part of the current workspace (and do match workspaceContains:${toplevelfilename} ?) .
  - when the server is not multi root enabled the server will be initialized with the current workspace folder.
- when file is not part of existing workspace the behaviour could be specified in the concrete extension but the default behaviour would be to do not start anything similar to what VScode do. This will solve #225.
User would be able to add/remove folders which again will fire up new server or add the folder to the existing server in case the server support multiple folders. Note that the current root detection logic could be used to find what is the root and suggest it as default value of read-directory-name so there is no overhead/increasing the complexity(similar what I do in multi-folder branch.)

This approach has several advantages:

It handles the corner cases - #225 #340 and removes the need from lsp-project-whitelist/lsp-project-blacklist and makes the project selection explicit and clear.
It will allow integration with emacs project tree explorers like treemacs which has the same notion of "workspace" as a set of folders which is the standard for most of the editor(Eclipse, Idea, VScode, Eclipse Che, Theia)
It is closer to the reference LSP client implementation - VSCode. The LSP is marketed as client agnostic but the truth is that the most of the server implementations are driven by the needs/behaviour of VSCode.

What do you think?

yyoncho on 22 Sep 2018

👍4

All 22 comments

@MaskRay , Is it possible to rely on projectile root?

myrgy on 25 Feb 2018

projectile can be an option. There should be some setting for lsp-python emacs-cquery to choose the project root detection method.

For projects with sub-projects, the user can define some known project roots to override projectile settings.

MaskRay on 25 Feb 2018

@myrgy I decide to use this for emacs-cquery

(cl-defun cquery--get-root ()
  "Return the root directory of a cquery project."
  (cl-loop for root in cquery-project-roots do
           (when (string-prefix-p (expand-file-name root) buffer-file-name)
             (cl-return-from cquery--get-root root)))
  (or
   (and (require 'projectile nil t) (projectile-project-root))
   (expand-file-name (or (locate-dominating-file default-directory "compile_commands.json")
                         (locate-dominating-file default-directory ".cquery")
                         (user-error "Could not find cquery project root")))))

If this idiom seems generic enough, I will change it to lsp-cquery-project-roots (the custom variable will be generated by some macro in lsp-mode, like lsp-cquery-blacklist-add)

MaskRay on 25 Feb 2018

Would it be possible to use emacs .dir-locals.el to set project root?
What do you think about making cquery-project-roots function or string list
In that case it would be possible to extend.

 (if cquery-project-roots
              (if (functionp cquery-project-roots)
                    // get list of project roots...
              ....

myrgy on 25 Feb 2018

Thanks for the advice.

(cl-defun cquery--get-root ()
  "Return the root directory of a cquery project."
  (when cquery-project-root-function
    (-when-let (root (funcall cquery-project-root-function))
      (cl-return-from cquery--get-root root)))
  (cl-loop for root in cquery-project-roots do
           (when (string-prefix-p (expand-file-name root) buffer-file-name)
             (cl-return-from cquery--get-root root)))
  (or
   (and (require 'projectile nil t) (ignore-errors (projectile-project-root)))
   (expand-file-name (or (locate-dominating-file default-directory "compile_commands.json")
                         (locate-dominating-file default-directory ".cquery")
                         (user-error "Could not find cquery project root")))))

But I am not sure .dir-locals.el serves as a good marker

MaskRay on 25 Feb 2018

In case of .dir-locals.el - it's possible to set some variable like cquery-project-root
e.g.
https://github.com/bbatsov/projectile/blob/master/projectile.el#L3180

myrgy on 25 Feb 2018

I also forgot to mention multiple roots of a single project. https://github.com/Microsoft/vscode-wiki/blob/master/Extension-Authoring:-Adopting-Multi-Root-Workspace-APIs.md

There should be someway not to use :rootUri to associate files with workspaces (or clients as in phst's PR). This does not work when users jump outside of the project root. This workaround is currently implemented in lsp-ui-peek.el

https://github.com/emacs-lsp/lsp-ui/blob/master/lsp-ui-peek.el#L482

            (unless lsp--cur-workspace
              (setq lsp--cur-workspace current-workspace))
            (unless lsp-mode
              (lsp-mode 1)
              (lsp-on-open))

MaskRay on 27 Feb 2018

FYI

LSP 3.6.0 has added a bunch of methods, among them there is workspace/workspaceFolders

workspace/configuration may be of interest to some clients.
textDocument/implementation and textDocument/typeDefinition are additional cross references
I have no idea what document color and its like do

MaskRay on 2 Mar 2018

👍2

Updated cquery's root detection logic in the topic. It consists of cquery-project-roots cquery-project-root-matchers

MaskRay on 5 Mar 2018

Just want to chime in and mention that, in order to compromise between @MaskRay's work on the LLVM/Clang projects and a mono-repo I use at work, cquery now provides a defcustom to allow users to tweak how cquery traverses directory.

The code isn't cquery-specific anymore and it might be nice to lift a slightly cleaned-up version of that into lsp-make-traverser, which would then allow users to match default directories with either custom predicates, locate-dominating-file (via file arguments), and hardcoded directory paths (that one isn't currently implemented cleanly). Essentially:

each language-server mode would provide an ordered list of predicates as their default traverser: string to match specific files (i.e. compile_commands.json, __init__.py, etc.), directory to hardcode specific directories as project roots, or function to run an arbitrary callable that either returns a project root or nil
users could then override and edit the traverser predicates on a per-mode basis

Hopefully the above would provide enough flexibility for all kinds of weird, exotic build systems and repo structures.

Edit: (…@MaskRay and I both managed to dump our responses at the same time, I didn't notice his new comments before I posted. Whoops.)

jaelsasser on 5 Mar 2018

Thank @jaelsasser's explanation. Further, we can make whitelist/blacklist/(lsp--should-start-p) a root matcher that may return 'blacklist to indicate lsp-mode should not be initialized, or nil if it should be initialized. The subsequent root matcher functions will be called to get the root.

Users usually add the lsp-mode client initialization logic in a mode hook:

(add-hook 'c-mode-common-hook #'lsp-cquery-enable)
(add-hook 'python-mode-hook #'lsp-python-enable)

Currently they will see (message "Not initializing project %s" root) in the echo area when a source file is blacklisted. They will have to use a wrapper to suppress the message.

We may use different symbols (e.g. 'blacklist-silent and 'blacklist) to decide whether the message should be suppressed.

MaskRay on 5 Mar 2018

the files within subprojects should be associated with the root project.

How is this supposed to work when accessing things from the standard library? Lets say

I have my project root at "~/Src/my_project"
I type "import random" in a project file
I go to definition of random, which opens /usr/lib/pythonx.x/random.py

At this point, should the project root for random.py be 'my_project'? i.e. its the context of the action that determines the root, not the file location?

jamescasbon on 18 Apr 2018

It seems projectile and project.el could not handle some subtle cases like subproject, so why not just prompt user to choose project-root directory, and then generate a file like .cquery in project-root directory to identify project-root. When there are some cases the program doesn't know handle, let user takes it instead.

ramsayleung on 13 May 2018

Is there a "current" solution for people who want to use this in monorepos? I have a project structure using yarn workspaces like...

my-monorepo/
|-- apps/
|   |-- frontend/
|       |-- src/
|       |    |-- App.js (import Header from '@myscope-header-component')
|       |-- package.json
|
|-- packages/
|   |-- header-component/
|       |-- src/
|       |    |-- Header.js
|       |-- package.json
|
|-- jsconfig.json
|-- package.json

If I import Header from '@myscope/header-component; from App.js it only seems to show import Header and type any. But if put in the relative path directly to the other file, it shows the structure of my exported function.

chris-kahn on 21 Aug 2018

👍2

@chris-kahn If you create .projectile in the root directory, due to projectile-project-root-files-bottom-up, all files will be seen as a part of the root directory.

I think get-root should be deprecated, just define an option and use that to define the project root:

projectile: (projectile-project-root)
project: (car (project-roots (project-current))) (this is what eglot uses)
get-root (current language specific make-traversal)

(define lsp-project-root-matchers '(get-root) ...) shall be a good starter if compatibility is a concern.
If not, I would suggest lsp-project-root-matchers '(project) as project.el is shipped with Emacs.

Waiting for someone to implement this...

pyls

I run into this issue today as I want to test pyls on telegramircd. It does not have any root indicator defined below:

(lsp-define-stdio-client lsp-python "python"
             (lsp-make-traverser #'(lambda (dir)
                         (directory-files
                          dir
                          nil
              "setup.py\\|Pipfile\\|setup.cfg\\|tox.ini")))
                         nil
                         :command-fn 'lsp-python--ls-command)

ccls

In ccls, I use:

(defcustom ccls-project-root-matchers
  '(".ccls-root" projectile-project-root) ...)

.ccls-root is sometimes preferable to .projectile because even if all subproject files should be visible to the language server, I want to use some projectile features confined in subprojects (e.g. I hope counsel-projectile-find-file lists just subproject files if current buffer is a subproject file)

MaskRay on 21 Sep 2018

👍1

@MaskRay IMO there shouldn't be an automatic root detection since it is not sufficient to cover all of the scenarios(e. g. multifolder scenario) but only root suggestion, you could take a look at https://github.com/emacs-lsp/lsp-mode/pull/409/files#diff-89805084d36a788cb21a777663a63538R838 . I am not VScode expert but I believe that it saves the root locations at some place and it uses them afterward while the root additions is interactive(e. g. Add folder from the menu). You could take a look at: https://github.com/emacs-lsp/lsp-mode/issues/424 too.

yyoncho on 21 Sep 2018

👍1

The detection mechanism of projectile is reasonable, IMO.
Or, get some ideas from vscode and nvim? AFAIK, the lsp client of nvim is very fast and stable.

seagle0128 on 21 Sep 2018

@yyoncho I think for most use cases a single project suffices, auto detection works fine. There are scenarios where workspaceFolders are useful and users should make decisions interactively. But the conceptual burden and complexity might make it only exposed to power users. In conclusion, I still believe auto root detection is good enough for most scenarios and should stick as the default.

I'll take a look at workspaceFolders when I have a chunk of free time.

MaskRay on 21 Sep 2018

@MaskRay

There are scenarios where workspaceFolders are useful and users should make decisions interactively. But the conceptual burden and complexity might make it only exposed to power users.

On the other side here are the extension activation events of VSCode(I have marked in bold the one that make sense for Emacs.)

onLanguage:${language}
onCommand:${command}
onDebug
workspaceContains:${toplevelfilename}
onFileSystem:${scheme}
onView:${viewId}
onUri

Here it is the flow that I am suggesting, note that it does not increase the complexity of using the extension:

On startup lsp-mode loads the set of workspace which could be plain list workspace name -> worspace folders. It could be stored in common location (e. g. ~/.emacs.d/.lsp-workspaces) or in dir-locals.el, etc. The workspace persistence could be implemented in various ways, this is only one of the options.
- when user opens file/directory which matches lsp extension activation point (see above) lsp-mode will do:
- when folder is part of existing workspace lsp-mode will start the corresponding server and make the opened workspace the active one:
  - when server is multi-root enabled lsp-mode will initialize the server with the folders that are part of the current workspace (and do match workspaceContains:${toplevelfilename} ?) .
  - when the server is not multi root enabled the server will be initialized with the current workspace folder.
- when file is not part of existing workspace the behaviour could be specified in the concrete extension but the default behaviour would be to do not start anything similar to what VScode do. This will solve #225.
User would be able to add/remove folders which again will fire up new server or add the folder to the existing server in case the server support multiple folders. Note that the current root detection logic could be used to find what is the root and suggest it as default value of read-directory-name so there is no overhead/increasing the complexity(similar what I do in multi-folder branch.)

This approach has several advantages:

It handles the corner cases - #225 #340 and removes the need from lsp-project-whitelist/lsp-project-blacklist and makes the project selection explicit and clear.
It will allow integration with emacs project tree explorers like treemacs which has the same notion of "workspace" as a set of folders which is the standard for most of the editor(Eclipse, Idea, VScode, Eclipse Che, Theia)
It is closer to the reference LSP client implementation - VSCode. The LSP is marketed as client agnostic but the truth is that the most of the server implementations are driven by the needs/behaviour of VSCode.

What do you think?

yyoncho on 22 Sep 2018

👍4

Thanks for the elaborated explanation! (which I have referred to a forum and some friends for more feedback) Overall, I think root suggestion isn't bad because as you mentioned:

On the other hand if automatic root discovery fails it would be very frustrating for anyone that does not know elisp.

If it gets wrong, it will be very difficult to change the root in the current framework.

And most people do not deal with too many projects. The root suggestion prompt will not be too annoying. (lsp-project-blacklist in my opinion is still useful because they are regex patterns that can exclude a bunch of projects)

Some thoughts about your other points

On startup lsp-mode loads the set of workspace which could be plain list workspace name -> worspace folders. It could be stored in common location (e. g. ~/.emacs.d/.lsp-workspaces

I worry a bit about the discoverability but I think it is good overall. In general, for other good designs that may introduce some compatibility issues which only affect power users, we may not need to be too cautious. I have been posting things on Emacs LSP ecosystem to a Chinese Emacs forum for a long time. My feedback received is that people can afford upgrading their configuration, but flaws that they have to work around will frustrate them more.

It handles the corner cases - #225 #340 and removes the need from lsp-project-whitelist/lsp-project-blacklist and makes the project selection explicit and clear.

225 the xref target can be:

/usr/include/{a.h,b.h,c.h}. In this case, prompts may be annoying. Populating workspace information to the new buffers work good enough, which is what I do for `lsp-ui-peek--goto-xref``

            (unless lsp--cur-workspace
              (setq lsp--cur-workspace current-workspace))
            (unless lsp-mode
              (lsp-mode 1)
              (lsp-on-open))

Other projects near the main project the user is browsing. I can imagine this may be the case for some Java, JavaScript, Python projects where the user don't want to have a global view (don't want to index third-parties) until they open a file in a third-party project. workspaceFolders should be useful in this scenario.

340 demonstrates that it is so bad to detect the root with `lsp-make-traversal` `"setup.py\\|Pipfile\\|setup.cfg\\|tox.ini"`. This should be just replaced with root detection (non-interactive) or root suggestion (interactive).

It will allow integration with emacs project tree explorers like treemacs which has the same notion of "workspace" as a set of folders which is the standard for most of the editor(Eclipse, Idea, VScode, Eclipse Che, Theia)

Nod

It is closer to the reference LSP client implementation - VSCode. The LSP is marketed as client agnostic but the truth is that the most of the server implementations are driven by the needs/behaviour of VSCode.

Asked a friend who uses VSCode and he confirmed this. This sometimes leads to unpleasant or non optimal decisions but pros overweigh cons.

MaskRay on 22 Sep 2018

👍2

I am glad that you are open for discussion and I agree with all of your points.

On startup lsp-mode loads the set of workspace which could be plain list workspace name -> worspace folders. It could be stored in common location (e. g. ~/.emacs.d/.lsp-workspaces

Just want to add that VScode has only one active workspace(by workspace I mean list of project folders, not lsp--workspace) at a time and you can load/save it in the following format:

{
    "folders": [
               {
            "path": "path1..."
        },
        {
            "path": "path2..."
        }
    ]
}

The approach with all workspaces in one file and loaded on startup is closer to projectile behaviour. I cannot tell which one would fit better and whether to support multiple workspaces at a time. This would matter only for multi root servers.

Other projects near the main project the user is browsing. I can imagine this may be the case for some Java, JavaScript, Python projects where the user don't want to have a global view (don't want to index third-parties) until they open a file in a third-party project. workspaceFolders should be useful in this scenario.

I am referencing the case when you directly navigate to library file e. g. any file in /usr/include/ or in ~/.rustup/toolchain, etc . In this case we have 3 posible actions:

Do nothing - as you have pointed out lsp-project-blacklist could handle this case.
Ask user for project root - as you have mentioned this would be annoying for CPP case so we can combine this with lsp-project-blacklist, e. g. if user cancels select root dialog to ask whether user wants to ignore the directory for the future and so on...
Fire up the default workspace - this is what I do in lsp-java with the following hack:
https://github.com/emacs-lsp/lsp-java/blob/master/lsp-java.el#L333 . In general, although it works lsp-java is full of ugly patches and workarounds which I hope will be fixed by restructoring lsp-mode.

yyoncho on 23 Sep 2018

👍1

Thanks, @yyoncho, for pulling me to this issue and for closing #397. It's highly unclear to me right now how best to help, so, please, just pull on me. I'd love to move the ball on lsp-mode in general in a way that supports the myriad of LSP server setups out there.