Right now we return the package name for imports if it comes from a known list.
This doesn't work if we want to look for new ones in the future.
Instead this should use a hash of the package name.
Some thoughts on perf:
regex and hashing).Proposal
symbol provider.Impact:
on save document, features in our own extension such as save on format & on save actions (sort imports on save) can get adversely affected.format on save, as VSC only allocates 750ms for the code to format the text and provide the text edits....We have to unhash them on the other side to determine if a specific package or not.
https://github.com/Microsoft/vscode-python/blob/master/news/3%20Code%20Health/4852.md
Is this correct, can we unhash them. Or do we compare against hashes of known packages.
There's no unhashing. It's comparing against known hashes. Like the unit test does.
We should measure the perf impact of this before we go off and design something more complicated. It only looks at the first 1K of lines.
Text is already in memory as far as I can tell, so we don't add any extra overhead by reading it.
Spawning a python process may take more cpu time than applying a regex against 1000 lines of text. I'll measure it to find out though.
@DonJayamanne debouncing sounds like a good idea. No need to queue up a ton of these if the file is just being saved a bunch. I'll add that too while I look at the perf.
. It only looks at the first 1K of lines.
You might want to use the getlineat method of the document object
Spawning a python process may take more cpu time than applying a regex against 1000 lines of tex
The key difference is, it's out of proc, i.e. The extension process time isn't used. Chewing extension process time is where we need to be careful.
I didn't mean the time to run the python code, I mean simply the time to call proc.spawn (and hook up the stdio handlers and read from them).
Iterating over a 1000 lines is really damn fast. Maybe faster than the amount of code that runs when calling proc.spawn.
I'm profiling now and for a single save of a 3000K line file, the profiler doesn't even register the time for the import. I have to save 3 or 4 times to get anything to show up for the import in the profile.
Our code lens parsing is a much bigger part of on save.
And this is with debug bits. Minimized bits are likely even faster but it's hard to tell what the CPU time is from.
Here's a detail analysis (of debug bits)

Essentially most of the time doing this work (on a 1000 lines) and it's not much time, just 5 ms, is generating the hashes. We could do the hash generation in the python code, but doesn't seem worth it.
Reparsing codelens on average is around 24ms each save on this same file.
And it seems there's a python spawn in the profile. Spawn itself takes 9ms. So this is actually faster.
I'll do the debounce though.
I think we should still use document.lineAt(<line number/index?).text rather than reading the entire text and splitting the lines, should be more efficient, even though it may be negligible.
Would be good to track the total time taken for this processing (we generally do that in most places when tracking telemetry)
Most helpful comment
Here's a detail analysis (of debug bits)
Essentially most of the time doing this work (on a 1000 lines) and it's not much time, just 5 ms, is generating the hashes. We could do the hash generation in the python code, but doesn't seem worth it.
Reparsing codelens on average is around 24ms each save on this same file.
And it seems there's a python spawn in the profile. Spawn itself takes 9ms. So this is actually faster.
I'll do the debounce though.