Mu: Automatically saved encoding cookie messes with lines listed in REPL

Created on 22 Mar 2018  路  13Comments  路  Source: mu-editor/mu

Edit: Changed "shebang" for "encoding cookie" for clarification of what I meant.


Because of the encoding cookie is added to the file but not shown on the editor the lines displayed and executed are different.

In this simple example you can see line 4 in the editor is reported as line 5 in the "run" error.
image

bug

Most helpful comment

@ntoll not at all -- far better that people come out and say what they definitely feel rather than dither and compromise for fear of giving hypothetical offence. If I'd had stronger feelings about it, I'd have made them known.

I do think it's a bit of a shame, because anyone who's using non-latin characters is going to have to come across encoding cookies at some point, but I'm not holding out on the basis of a theoretical need.

All 13 comments

@carlosperate are you referring to the encoding cookie? (Or are we also adding a #! line as well?)

Ah -- I see; we save the file temporarily when we run it, but we don't reload. Ok -- I think this answers one of my unanswered questions on the encoding work: should we visibly inject the encoding cookie in a new/loaded? I think the answer's going to be: yes.

Yes, I meant the encoding cookie, updated the title to reflect the right nomenclature.

To be honest I'm not a big fan of the cookie, is it possible to force the encoding in the executed script via Env var or Python setting? Not sure how users will feel having this constantly added to their scripts.

Now that I've lived with it a couple of days (and firmly wearing my devil's advocate hat here), I'm left wondering if there's any way we can achieve the same outcome but without having to tamper with users' scripts with what will appear an confusing computer-y speak. I can guarantee kids will:

  • want to know what it is
  • want to know what it's for
  • mess about with it

...and teachers won't know the answer.

I'm guessing we need to do something like export PYTHONIOENCODING=utf8 somewhere in a way that is both safe for Mu but doesn't interfere with other Pythonic things.

Thoughts..? (As always, I'm all ears for suggestions, critique and ideas -- my main concerns being simplicity.)

I don't believe that env var will do the right thing. (It's easy enough to check but I've got a lot on my plate ATM). If it does, then I agree that we should use it. If it doesn't, we've still got a couple of options:

  • Use this as a learning experience with a small comment block indicating what the cookie's for (not ideal; I agree that it's a tad intrusive; but it is a good idea in general to set an encoding or the source code).

  • Only inject the cookie when you can't get a read using the locale-default encoding. At present we always add the cookie and save as utf-8. Instead, if we get a good read with the locale encoding, we could track that, save using that, and don't add the cookie.

  • Never add the cookie: if we can't get a clean read when we open, the we pop up the kind of message we do now, but more often.

My 2c... quite prepared to be talked out of this and totally open to differences of opinion etc...

Given this context:

  • Most files opened by Mu will have been created by Mu (i.e. this is supposed to be your first editor, so the chances that you're using something else are relatively small).
  • Kids, teachers and learners are likely to be confused, intimidated, misunderstand the cookie (and we want to decrease the chance of these feelings happening).
  • When we try to open a file and encounter problems, it has useful and actionable information (i.e. re-save as UTF-8).

Then:

  • The "learning opportunity" of @tjguk's first option likely requires lots of context for a learner to understand what's going on in the first place and, I'd argue, at this stage of their development they probably shouldn't be bothered with such concerns (they're trying to work out how to define a function rather than have fun with unicode and different character encodings etc...) ;-)
  • Option two sounds like it'll add complexity to our code when it comes to tracking encodings. I'm anxious that we keep the code base simple and, if possible, avoid writing any code if at all necessary.
  • The third and final option is, IMHO, preferable because it is both simpler in terms of Mu's code base, and only pops up in the user's face if it is a problem and, as is currently the case, describes something actionable to do. I'm assuming we're enforcing Mu always uses UTF-8 in this case.

Does this movement of thought make sense? Care to kick holes in it..! Thoughts and feedback most welcome!

Little late to the party:

Currently, we have

# -*- coding: utf-8 -*- # Encoding cookie added by Mu Editor

Which as mention raises many questions: What's Encoding?, What's a cookie?, What's utf-8?

Now clearly if this is going to be present we can't do much about utf-8 but we could perhaps simplify to

# -*- coding: utf-8 -*- # Added by Mu

Which if nothing else is shorter

Ok, I've finally found time to look at this. So clearly the cookie is not the way people want to go. I propose the following:

On import:

  • If no cookie is present, try utf-8; then try the locale default; then give up and produce an error dialog
  • If a cookie is already present [which is good practice] then honour it when decoding and re-encoding. In my previous code (ie current master) we honoured cookies on decoding and replaced by utf-8 on encoding. I don't honestly think it's a good idea to replace an existing cookie with nothing.

I understand @ntoll point Mu expecting to read & write its own files, but -- wearing my database developer's hat -- we have to cope with the data as we might get it, not as we'd like it to be...

In any event, I'll start to rework tests to cope with the mu default being no cookie. (I need to tweak a couple of places where a newline-count is assumed). That'll give us the flexibility to use the existing cookie, a different one, or none at all.

@tjguk you're a :star2: I concur with your proposed behaviour.

I'm sorry we appear to be changing things WRT the cookie, but I guess we couldn't have known what it "felt" like to use, until it was implemented. In any case, thank you for all the work and support for Mu..!

(Tracking for now at https://github.com/tjguk/mu/tree/rethink-cookies)

@ntoll not at all -- far better that people come out and say what they definitely feel rather than dither and compromise for fear of giving hypothetical offence. If I'd had stronger feelings about it, I'd have made them known.

I do think it's a bit of a shame, because anyone who's using non-latin characters is going to have to come across encoding cookies at some point, but I'm not holding out on the basis of a theoretical need.

Sounds like a good approach.
The only thing I would keep an eye on is how Qscintilla deals with encodings, as there is a flag to force UFT-8, but now sure if we are using it or how what it does with other encodings when the flag is not set.

Thanks @carlosperate. I admit to being a little confused as to QScintilla's position here. I was assuming that it would deal with Unicode characters / codepoints, not encoded bytes. But @ntoll picked up an issue testing my encoding PR and specified the SetUtf8 method (which we now specify twice -- I'll try to dig into that). Which suggests that QScintilla is actually handling bytes and expects you to tell it the encoding. Any which way, there seems to be solid support for Utf8 bytes so if we stick with Unicode and tell it to use Utf8 we seem to be covered both ways.

Unless I'm mistaken, this is no longer a problem, right...? Closing.

(Please re-open if the issue persists, but I can't recreate it).

Was this page helpful?
0 / 5 - 0 ratings