Irssi: Implement recode for everything, not only messages

Created on 22 Jun 2014  路  10Comments  路  Source: irssi/irssi

Currently recode only works for messages. It should be implemented so it can affect other things, such as channel names or hostnames.

Two examples, both of them on a network whose encoding is not utf8:

  • If you try to join #莽, two windows will open: one with a title of #莽, supposedly in utf8 and completely empty, and then another one, with #? as window title, which will be the real channel. Irssi will say -!- You have joined #? and all occurrences to the channel's name will have a ? instead of 莽.
  • If someone with a hostmask with 莽 in it joins/parts/whatever, it'll be replaced by ?.
enhancement

All 10 comments

Here are the existing recode patches:
https://github.com/m1el/irssi/commit/6f8f2b45f7de1befbfa157a3a5b3d6c01bac3946?w=1
https://github.com/seokbeomKim/irssi-recode-patch/blob/master/irssi-0.8.15_recode_r6.patch

The recoding issue can appear differently shaped:

  • Server behaves like described below (in old proposal of ISUPPORT, this was described as CHARSET=xxx)

    • Server enforces this charset on channels and nicks

    • in that case, it would be nice if /recode add <tag> charset recoded nicks and channels and hostmasks etc.

    • Network allows free-form channel names (this is pretty common)

    • might need a /join -recode <charset>, /join -bytes, /channel -charset, and possibly a /join -as #othername, for example if you need to join two channels with the same name but one in latin and the other in utf-8

    • channel output in /whois and other oper-commands should make it easier to get at the bytes or correctly detect charset _per channel_

    • Network allows free-form nick names/hostmasks

    • now what can we do?

  • Encoding of text between users

    • Network enforces charset on privmsgs (rare)

    • Users can send any bytes they want

    • the existing /recode should semi-work for that, given that participants of a channel can agree upon a charset

    • it would be useful to be able to re-recode messages in case they had been recoded from the wrong charset (because you didn't guess correct encoding for a channel)

    • multiple fallbacks for a channel would be good, like mixed ISO-2022-JP/ShiftJIS chatting

    • per-in channel user recode is sometimes needed, for example if someone joins and uses the wrong charset to ask a question, it would be nice if you could

      a) read their message

      b) temporarily reply in their charset

  • IRC Protocol tokens (PRIVMSG, NOTICE, CTCP ACTION, incoming numerics, ...) must never be recoded
  • Server side messages, MOTD, explanatory text on numerics (e.g. :is logged in as on 330)
  • Topics, Away messages, ...

I seem to remember that CHARSET= was removed from the last ISUPPORT draft because nobody could agree on what exactly it's supposed to mean.

a dirty hack to recode a whole server (without choice, then)

# require socat *2* !!!
socat tcp-listen:6666,reuseaddr 'system1:"perl -CI recode.pl 1" % system1:"perl -CO recode.pl 0" | tcp:irc.friend-chat.jp:6667'

# in irssi
/connect localhost 6666

# save as recode.pl:
#!/usr/bin/perl -lp
use Encode qw(encode decode);
BEGIN { $|++; *z=shift()?*encode:*decode }
$_ = z("iso-2022-jp", $_)


# or with automatic decode guess:
#!/usr/bin/perl -lp
use Encode qw(encode decode);
use Encode::Guess;
BEGIN {
    $|++;
    my @enc = qw(euc-jp shiftjis 7bit-jis);
    *z=shift()?sub{$_=encode("iso-2022-jp",$_)}
    :sub{
        my $enc = guess_encoding($_,@enc);
    $enc = do {
        local $Encode::Guess::NoUTFAutoGuess=1;
        guess_encoding($_,@enc)
    } unless ref $enc;
    $_ = $enc->decode($_) if ref $enc;
    }
}
z()

I wish all downstream bug trackers would die a painful death.

I have started a bounty for this issue on Bountysource. Been requesting this feature for over 10 years now... my friend @kmerenkov first reported this issue on the old tracker back in 2005.

bounty is now at Bountysource

Gave up on the bounty as nobody seemed interested, moved the money to a different project.

fair enough, but is this still relevant for you?

Very much so. I still use irssi every day and still can't join any non-latin1 channels on utf8 terminals because the server I'm on uses iso-2022-jp (even for channel names).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

redconnection picture redconnection  路  19Comments

lewisrobbins picture lewisrobbins  路  19Comments

ranieuwe picture ranieuwe  路  5Comments

ahihi picture ahihi  路  9Comments

foice picture foice  路  10Comments