Edit: Changed "shebang" for "encoding cookie" for clarification of what I meant.
Because of the encoding cookie is added to the file but not shown on the editor the lines displayed and executed are different.
In this simple example you can see line 4 in the editor is reported as line 5 in the "run" error.

@carlosperate are you referring to the encoding cookie? (Or are we also adding a #! line as well?)
Ah -- I see; we save the file temporarily when we run it, but we don't reload. Ok -- I think this answers one of my unanswered questions on the encoding work: should we visibly inject the encoding cookie in a new/loaded? I think the answer's going to be: yes.
Yes, I meant the encoding cookie, updated the title to reflect the right nomenclature.
To be honest I'm not a big fan of the cookie, is it possible to force the encoding in the executed script via Env var or Python setting? Not sure how users will feel having this constantly added to their scripts.
Now that I've lived with it a couple of days (and firmly wearing my devil's advocate hat here), I'm left wondering if there's any way we can achieve the same outcome but without having to tamper with users' scripts with what will appear an confusing computer-y speak. I can guarantee kids will:
...and teachers won't know the answer.
I'm guessing we need to do something like export PYTHONIOENCODING=utf8 somewhere in a way that is both safe for Mu but doesn't interfere with other Pythonic things.
Thoughts..? (As always, I'm all ears for suggestions, critique and ideas -- my main concerns being simplicity.)
I don't believe that env var will do the right thing. (It's easy enough to check but I've got a lot on my plate ATM). If it does, then I agree that we should use it. If it doesn't, we've still got a couple of options:
Use this as a learning experience with a small comment block indicating what the cookie's for (not ideal; I agree that it's a tad intrusive; but it is a good idea in general to set an encoding or the source code).
Only inject the cookie when you can't get a read using the locale-default encoding. At present we always add the cookie and save as utf-8. Instead, if we get a good read with the locale encoding, we could track that, save using that, and don't add the cookie.
Never add the cookie: if we can't get a clean read when we open, the we pop up the kind of message we do now, but more often.
My 2c... quite prepared to be talked out of this and totally open to differences of opinion etc...
Given this context:
Then:
Does this movement of thought make sense? Care to kick holes in it..! Thoughts and feedback most welcome!
Little late to the party:
Currently, we have
# -*- coding: utf-8 -*- # Encoding cookie added by Mu Editor
Which as mention raises many questions: What's Encoding?, What's a cookie?, What's utf-8?
Now clearly if this is going to be present we can't do much about utf-8 but we could perhaps simplify to
# -*- coding: utf-8 -*- # Added by Mu
Which if nothing else is shorter
Ok, I've finally found time to look at this. So clearly the cookie is not the way people want to go. I propose the following:
On import:
I understand @ntoll point Mu expecting to read & write its own files, but -- wearing my database developer's hat -- we have to cope with the data as we might get it, not as we'd like it to be...
In any event, I'll start to rework tests to cope with the mu default being no cookie. (I need to tweak a couple of places where a newline-count is assumed). That'll give us the flexibility to use the existing cookie, a different one, or none at all.
@tjguk you're a :star2: I concur with your proposed behaviour.
I'm sorry we appear to be changing things WRT the cookie, but I guess we couldn't have known what it "felt" like to use, until it was implemented. In any case, thank you for all the work and support for Mu..!
(Tracking for now at https://github.com/tjguk/mu/tree/rethink-cookies)
@ntoll not at all -- far better that people come out and say what they definitely feel rather than dither and compromise for fear of giving hypothetical offence. If I'd had stronger feelings about it, I'd have made them known.
I do think it's a bit of a shame, because anyone who's using non-latin characters is going to have to come across encoding cookies at some point, but I'm not holding out on the basis of a theoretical need.
Sounds like a good approach.
The only thing I would keep an eye on is how Qscintilla deals with encodings, as there is a flag to force UFT-8, but now sure if we are using it or how what it does with other encodings when the flag is not set.
Thanks @carlosperate. I admit to being a little confused as to QScintilla's position here. I was assuming that it would deal with Unicode characters / codepoints, not encoded bytes. But @ntoll picked up an issue testing my encoding PR and specified the SetUtf8 method (which we now specify twice -- I'll try to dig into that). Which suggests that QScintilla is actually handling bytes and expects you to tell it the encoding. Any which way, there seems to be solid support for Utf8 bytes so if we stick with Unicode and tell it to use Utf8 we seem to be covered both ways.
Unless I'm mistaken, this is no longer a problem, right...? Closing.
(Please re-open if the issue persists, but I can't recreate it).
Most helpful comment
@ntoll not at all -- far better that people come out and say what they definitely feel rather than dither and compromise for fear of giving hypothetical offence. If I'd had stronger feelings about it, I'd have made them known.
I do think it's a bit of a shame, because anyone who's using non-latin characters is going to have to come across encoding cookies at some point, but I'm not holding out on the basis of a theoretical need.