Julia: ASCII infix XOR and rotation

Created on 26 Aug 2018 · 44Comments · Source: JuliaLang/julia

In many applications of coding theory and cryptography the most common operation is XOR on bit vectors (the second most frequent operation is cyclic rotation). Unfortunately, Julia v1.0 has no ASCII infix notation for them. Replacing our ASCII based toolchain and the software infrastructure just for Julia is not an option, therefore we have to use the xor() and circshift() functions. But now the expressions look very different from the ones in books, papers and functions written in other programing languages.

To get around the problems, I tried the following functions and methods:

BitVector(A::Array{T}) where{T <: Real} = A[:] .!= 0 # constructor, true <- nonzeros of Real array
BitVector(s::String) = Vector{Char}(s) .!= '0' # string to BitVector ('0' or not), Unicode is OK
prt(b::BitArray{1}) = (for i in b print(i ? '#' : ".") end; println()) # print line with ./# for bits
import Base.+
+(x::BitArray{1},y::BitArray{1})::BitArray{1} = xor.(x,y)   # XOR of BitVectors
import Base.*
*(b::BitArray{1},n::Integer)::BitArray{1} = circshift(b,n)  # cyclic rotations

I have encountered a couple of problems. Can anyone suggest better, yet very simple solutions (for education purposes)?

Using BitVector as type declaration of function parameters gives the error "ArgumentError: invalid type for argument b in method definition…" We have to use BitArray{1}.
Redefining the power-of operator ^ for BitVectors with integer second parameter fails at negative literals: Julia tries to invert the BitVector, which is not defined
Our attempts to use both the "where" and the output type declaration of a function (the BitVector constructor above) in single line definition failed with various error messages.

Please consider adding to Julia ASCII infix operators for XOR (e.g. "><") and rotation (e.g. "<<<" or "<>")!

Source

LaszloHars

Most helpful comment

Or even a standard-ish notation here?

It's not ASCII, but the obvious choice would be

julia> const ↻ = circshift
circshift (generic function with 3 methods)

julia> [1,2,3] ↻ 2
3-element Array{Int64,1}:
 2
 3
 1

simonbyrne on 27 Aug 2018

❤5 😄2 👍2

All 44 comments

Curiously, >>> is an operator but <<< is a syntax error. It does appear that there are << and >> left and right bit shift operators, and ⊻ is xor, though, as you say, not ASCII.

yurivish on 26 Aug 2018

Problem 1. is fixed by adding:
import Base.BitVector
If only we get a meaningful error message without it...

LaszloHars on 27 Aug 2018

A bit off to the side of the topic, but:

Don't overload Base.+(x::BitArray{1},y::BitArray{1}; or Base.*(x::BitArray{1},y::BitArray{1}.
That is type-piracy (Overloading a function from another module, using types from another module).
That will break things.

You might be happier defining your own type.

Something like

struct FriendlyBitVector
    v::BitArray{1}
end

import Base.+
Base.xor(x::FriendlyBitVector,y::FriendlyBitVector)= FriendlyBitVector(xor.(x.v, y.v))
+(x::FriendlyBitVector,y::FriendlyBitVector)= xor.(x,y)


import Base.*
Base.circshift(b::FriendlyBitVector, n) = circshift(b.v, n)
*(x::FriendlyBitVector, n)= circshift(b,n)

Base.show(io::IO, b::FriendlyBitVector) = (for i in b.v print(io, i ? '#' : ".") end; println(io))

# also define |, and & etc

(example only not tested)

"><" is a cute infix operator. It could be viable to add

oxinabox on 27 Aug 2018

Replacing our ASCII based toolchain and the software infrastructure just for Julia is not an option

Out of curiosity, what major toolchain and software infrastructure these days is restricted to ASCII?

stevengj on 27 Aug 2018

Out of curiosity, what major toolchain and software infrastructure these days is restricted to ASCII?

Our own development from years ago. Editors, version control, code-snippet database, search, code documentation... The assumption was that one can write an ASCII program in any programing language we teach.

LaszloHars on 27 Aug 2018

I think with a such a large restriction, i.e. ASCII-only, it's fair that some things just don't get infix operators and that xor and bit rotations are among them. Using ASCII syntax for these operators just doesn't seem very compelling. We already have simple enough function names and you can use normal function syntax. Infix operator syntax is something of a luxury, in this case one that an ASCII-only environment doesn't afford.

StefanKarpinski on 27 Aug 2018

👍4

We effectively do have an ASCII infix xor for boolean arrays: .!=. It's a comparison operator, though, so watch out for its precedence. Even so, I'd say that it makes much more sense than defining + or ^ to do something surprising.

Are there any languages with infix ASCII circular shift operators? Or even a standard-ish notation here?

mbauman on 27 Aug 2018

👍2

Using BitVector as type declaration of function parameters gives the error "ArgumentError: invalid type for argument b in method definition…" We have to use BitArray{1}.

Can you give a reproducible example of this? Seems to work for me.

JeffBezanson on 27 Aug 2018

Our attempts to use both the "where" and the output type declaration of a function (the BitVector constructor above) in single line definition failed with various error messages.

This requires parentheses, e.g.

(f(x::T)::T) where T = ...

AFAICT, there's no easy way to make the precedence work differently for this.

JeffBezanson on 27 Aug 2018

>< as an infix operator seem reasonable, but there are a few weird edge cases where it currently parses as an expression, e.g. (1,1).><(1,1) (h/t @omus).

simonbyrne on 27 Aug 2018

Or even a standard-ish notation here?

It's not ASCII, but the obvious choice would be

julia> const ↻ = circshift
circshift (generic function with 3 methods)

julia> [1,2,3] ↻ 2
3-element Array{Int64,1}:
 2
 3
 1

simonbyrne on 27 Aug 2018

❤5 😄2 👍2

Thank you guys, for all your responses!

it's fair that some things just don't get infix operators and that xor and bit rotations are among them

Understood. However, the proposed "><" XOR, and "<>" rotation infix operators don’t seem to interfere with anything important in the language, and would solve all our problems. I just ask to consider adding them for low budget old school environments.

Even better could be providing means for user defined infix operators for combinations of special characters, like the above two, or */, *<, etc. Unfortunately, these would require some changes in the code parser.

This requires parentheses

Thanks! Parentheses solve problem 3.

Import Base.BitVector had solved problem 1, so the only remaining difficulty (problem 2.) is that after redefining the operator ^ for any Integer exponents, Julia still tries to perform an "inverse" operation, when the exponent is a negative literal. This seems to be out of place.

LaszloHars on 27 Aug 2018

so the only remaining difficulty (problem 2.) is that after redefining the operator ^ for any Integer exponents, Julia still tries to perform an "inverse" operation, when the exponent is a negative literal. This seems to be out of place.

Julia cares very much about meanings of things. For example, if you override * to mean something that's not multiplication, you'll also break exponentiation since that depends upon repeated multiplications. We trust you that * means multiplication, so that's what we'll use in many built-in algorithms beyond just ^.

In this specific case, it's because Julia uses Base.literal_pow to allow for optimizations and better return types where the exponent is a literal. See the documentation for ^ for more details.

mbauman on 27 Aug 2018

Do I understand right that if I define the ^ operator for a new data type, I also have to define literal_pow() and inverse for this data type, even though negative exponents could be handled directly? The documentation does not seem to list all the functions to be defined, so do we have to dig into the source code of the Julia modules?

LaszloHars on 27 Aug 2018

Just defining ^ and Base.literal_pow is sufficient, but again, you're running into trouble here because you're trying to define Base.^ to do something other than exponentiation. That's the crux of the problem.

mbauman on 28 Aug 2018

Note that $ is still available as an (undefined) infix operator, so you could use that:

julia> ($)(a,b) = xor(a,b)
$ (generic function with 1 method)

julia> 1 $ 2
3

simonbyrne on 28 Aug 2018

You could also define 2-argument methods ~ (only 1-argument methods are used).

simonbyrne on 28 Aug 2018

Wonderful! Are these undocumented features, which may disappear in future Julia versions? Where can I find their precedence and associativity, assuming they are safe to use?

LaszloHars on 28 Aug 2018

They shouldn't change without due deprecation (i.e. they should stay in Julia 1.x, and should get a round with deprecation warnings if they do change).

I'm not sure all infix operators are listed anywhere except the parser itself:
https://github.com/JuliaLang/julia/blob/3ab56f19a81a1d658a02750b48abbd63950ef248/src/julia-parser.scm#L9-L30

simonbyrne on 28 Aug 2018

Do I understand right that if I define the ^ operator for a new data type, I also have to define literal_pow() and inverse for this data type

You need to define inv (since negative literal powers default to calling this), but not literal_pow. Or define literal_pow but not inv (to change the fallback for negative literal exponents). (Or both.)

stevengj on 28 Aug 2018

I'd like to see ↻ and ↺, so we may infix circularly shifting firsts->lasts and lasts->firsts.

JeffreySarnoff on 29 Aug 2018

I think, you can just define these as circshift(::BitVector). Out of curiosity, how would you enter these symbols in various Unicode enabled editors? If you need them hundreds of times in a function, I would want the editor with the least typing overhead.

LaszloHars on 29 Aug 2018

At the Julia REPL, and in most "julia mode" editors, you can use \circlearrowright[TAB] / \circlearrowleft[TAB]

simonbyrne on 29 Aug 2018

That would drive me crazy. Try to type these a few hundred times, needed for coding a complex cypher :-(

LaszloHars on 29 Aug 2018

The REPL runs in a terminal. It is likely your terminal (or OS or text editor) allows you
to map keycombinations to chars (e.g. Alt-Shift-R). I use Ctrl-C, Ctrl-V to copy/paste unicode.

On Tue, Aug 28, 2018 at 8:24 PM LaszloHars notifications@github.com wrote:

That would drive me crazy. Try to type these a few hundred times, needed
for coding a complex cypher :-(

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/JuliaLang/julia/issues/28904#issuecomment-416782710,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABmqxilp7pe-G8W4roH5yJcmy7_w1Y69ks5uVd8tgaJpZM4WM62z
.

JeffreySarnoff on 29 Aug 2018

I tried to follow the advice of oxinabox (Lyndon). Thank you!
I put the (not yet complete) code in a module, with the necessary methods and constructors. Since it is my first Julia module, could someone look at it, and give me advice, how to do it better? (Especially, iterate() looks terribly slow.)

"""
Module of Bits data type and methods.
Implemented as wrapper for BitArray{1} but instead of true/false we use 1/0.

Supported constructors from var-arg numeric parameters, vectors and strings.
Bits0, Bits1, BitsR(n): n-long Bits of all 0, all 1, Random.

Usual meaning: <<, >>, |, &, ~, ==, !=, length, getindex, setindex, findall, reverse, repeat, iterate.
Bits/Integer: circshift. show and prt defined for pretty printing.

Bits can be interpreted as coefficients of a binary polynomial with +, *, /, inv, %.
"""
module BitsModule
export Bits, Bits0, Bits1, BitsR, prt
using Random

struct Bits
    v::BitArray{1}
end

Bits0(n::Integer) = Bits(falses(n))                 # Bits0(12)
Bits1(n::Integer) = Bits(trues(n))                  # Bits1(12)
                                                    # Random.seed!(::Int); may be used
BitsR(n::Integer) = Bits(bitrand(n))                # random n-long Bits from BitVector (~key)

Bits(s::String) = Bits(Vector{Char}(s) .!= '0')     # string to Bits (char[i]=='0' or not), Unicode is OK
                                                    # b = Bits("1011002-ß000")

Bits(A::Real...) = Bits([A...] .!= 0)               # constructor, bit[i]==1 iff parameter[i] != 0
                                                    # b = Bits(1,0,1,1,0,0,2,3,4.5,0,0,0)

Bits(A::Array{T}) where{T<:Real} = Bits(A[:].!=0)   # constructor, true <- nonzeros of Real array
                                                    # b = Bits([1 0 1 1 0 0 2 3 4.5 0 0 0])

Base.xor(x::Bits,y::Bits) = Bits(xor.(x.v, y.v))
import Base.+
+(x::Bits,y::Bits)::Bits = xor(x,y)                 # b + c = xor(b,c) as binary polynomials

Base.circshift(b::Bits,n::Integer) = Bits(circshift(b.v, n))
import Base./
/(b::Bits, n::Integer)= Bits(circshift(b.v,n))      # Bits rotated by ±n places

import Base.<<
<<(b::Bits, n::Integer)= Bits(b.v << n)             # Bits shifted left by ±n places

import Base.>>
>>(b::Bits, n::Integer)= Bits(b.v >> n)             # Bits shifted right by ±n places

import Base.|
|(b::Bits,c::Bits) = Bits(b.v .| c.v)               # bitwise OR of Bits

import Base.&
(&)(b::Bits,c::Bits) = Bits(b.v .& c.v)             # bitwise AND of Bits

import Base.~
~(b::Bits) = Bits(.~b.v)                            # bitwise NOT of Bits

import Base.==
==(b::Bits,c::Bits) = (b.v == c.v)                  # ~isequal. (!= is automatic)

import Base.show
show(io::IO, b::Bits) = (for i in b.v.+0 print(io,i) end; println(io))

prt(b::Bits,t='#',f='-') = (for i in b.v print(i ? t : f) end; println())

import Base.length
length(b::Bits)::Integer = length(b.v)              # number of bits

import Base.getindex
getindex(b::Bits,i::Integer) = getindex(b.v,i) ? 1 : 0  # use as b[2], b[3:6]

import Base.setindex!
setindex!(b::Bits,x,i) = setindex!(b.v,x.!=0,i)     # use as b[2] = 1, b[2:4] = [1,0,1]

import Base.findall
findall(b::Bits) = findall(b.v)                     # Int array of indeces of 1-bits

import Base.reverse
reverse(b::Bits) = Bits(reverse(b.v))               # reverse the sequence of bits

import Base.repeat                                  # repeat Bits
repeat(b::Bits, n::Integer) = Bits(repeat(b.v,n))

import Base.iterate                                 # makes iteration work, e.g. for i in MyBits...
iterate(b::Bits,a...) = iterate(b.v.+0,a...)        # b.v.+0 makes Int array of 0/1 values

end

LaszloHars on 29 Aug 2018

Before any other notes, this is better style and easier when working:

module MyModule

export this_const, that_struct, a_function

import Base: xor, circshift, <more>

# body

end # MyModule

The difference between import ModuleName: function1, function2 and using ModuleName: function1, function2 is that with import when you overload the imported function, that new dispatch path becomes available to all other packages (and Main) which access yours with using ModuleName. In this module, import is the way to go, as you chose. When you do that, then you do not need to prefix each imported function name with the module from which it is imported: import Base: circshift function circshift( rather than function Base.circshift( [which you can do if you really want to emphasize that you are overloading a function defined elsewhere, it causes no harm],

Module, function and variable naming is something the community has spent time considering; see the docs and these examples.

JeffreySarnoff on 29 Aug 2018

"Type Piracy" (taking hold of a type from Base or from another package and "improving it" without doing something to differentiate the revised API from the original -- by e.g. wrapping it in a struct, as you did) is to be avoided.

JeffreySarnoff on 29 Aug 2018

Thanks, Jeff. Since I overload +, /, and others, modifying their meaning for BitArrays could have unwanted side effects in other places. This was the reason I was advised to use a new data type, instead.

LaszloHars on 29 Aug 2018

happy to help where I can be of help -- Jeffrey

JeffreySarnoff on 29 Aug 2018

Hello. I also encounter similar issues. The additional problem with ⊻ is absence in most fonts: e.g., between 200+ Unicode fonts in my Windows distribution only 5-6 families have such symbol, but without any monospace versions. In fact, I myself in articles get used to ⊕ that seems more common in Unicode fonts as well. The Juno editor I tried may only show the symbol ⊻ in program window, but not in REPL.

Finally, using font editor I cooked a monospace font “DejaVuSansMonoExt” with the ⊻ symbol using DejaVuSansMono extended by some mathematical symbols from DejaVuSans and DejaVuSansCondensed. Even if the new font set as default in Juno from JuliaPro, I got “?” instead of ⊻, yet after addition the font to list of fonts in Windows terminal via registry editor, it is correctly visible in console REPL window of recent Julia 1.1.0 distribution after entering xor[tab].

qubeat on 16 Feb 2019

Regarding monospace font and non-ascii source: An additional pain is embedding julia code listings in latex.

I think :\ is still open (currently a syntax error). Instead of burning that for xor, we could burn it for all non-ascii at the same time: :\xor, :\cup, :\mu, etc. This way, u ⊻ v could be written as either u :\xor v or :\U22bb, and ξζ would have the alternate spellings :\xi\zeta and :\U03be\U03b6, as well as combinations. Alternate spellings already have precedent: For example, there are multiple different unicode \mu variants that parse into the same Symbol("\U03bc"). This transformation would happen at parsing time.

Of course, source code with :\U22bb is super ugly, but having an escape hatch would permit source-code transformations that are guaranteed to bring us to good old 7bit ascii. And I would consider u :\xor v very readable in source, definitely better than xor(u,v) and possibly even better than u ⊻ v (but that's a matter of taste).

chethega on 16 Feb 2019

I think :\ is still open (currently a syntax error).

It's not, it's the syntax for the backslash symbol:

julia> :\
:\

If followed by an identifier it's a syntax error though:

julia> :\xor
ERROR: syntax: extra token "xor" after end of expression

And I would consider u :\xor v very readable in source, definitely better than xor(u,v)

Oh no, xor(u,v), the horror! I cannot understand how someone could consider u :\xor v to be more readable than the simple function call syntax. De gustibus non disputandum est 🤷‍♂️

StefanKarpinski on 16 Feb 2019

Hello. I also encounter similar issues. The additional problem with ⊻ is absence in most fonts: e.g., between 200+ Unicode fonts in my Windows distribution only 5-6 families have such symbol, but without any monospace versions. In fact, I myself in articles get used to ⊕ that seems more common in Unicode fonts as well. The Juno editor I tried may only show the symbol ⊻ in program window, but not in REPL.

Finally, using font editor I cooked a monospace font “DejaVuSansMonoExt” with the ⊻ symbol using DejaVuSansMono extended by some mathematical symbols from DejaVuSans and DejaVuSansCondensed. Even if the new font set as default in Juno from JuliaPro, I got “?” instead of ⊻, yet after addition the font to list of fonts in Windows terminal via registry editor, it is correctly visible in console REPL window of recent Julia 1.1.0 distribution after entering xor[tab].

We went through all these, too. It could be nice if the Julia language did not use Unicode symbols, but allowed easy definition of Unicode aliases, in case someone wants them. Even older printers cannot handle Unicode. There are plenty of possibilities for 2-char infix operators, which should not be difficult to add to the language. (Because of the need for new tools and the nightmare of Unicode fonts, we dropped Julia from the curriculum.)

LaszloHars on 17 Feb 2019

Folks, there's a function for this: xor. If you don't like the Unicode, use the xor function.

StefanKarpinski on 17 Feb 2019

👍3

Folks, there's a function for this: xor. If you don't like the Unicode, use the xor function.

You missed the point, stated in the first post. Textbooks, publications, algorithm descriptions, program codes in other languages, electronic circuit simulators, circuit synthesizers etc. describe algorithms, protocols, functions, error correction codes, ciphers, S-boxes, electronic circuits etc. with infix XOR notations. Sure, you can convert them manually to function calls in Julia implementations, but the conversion is error-prone, hard to debug and look very different. For publications then you have to convert function calls back to infix notation. With only Unicode infix XOR and cyclic shifts Julia makes the life of our students and instructors harder than other languages do.

Don't get me wrong: I don't complain. This is the problem with free tools: the developers do what they like, not what some fringe applications need. We just have to use programming languages, which better fit these applications.

LaszloHars on 17 Feb 2019

Don't get defensive, I did not mean to be argumentative. In the thread it was explained, why the infix Unicode XOR operator is not useable in many environments (e.g. no standard monospace Windows font shows it, and older, ASCII based systems cannot even enter it). The xor() function works, but poses other difficulties at converting sources of information to this notation. Defining our own infix operators is not documented in sufficient details, and especially unclear, how to set their precedence. Try to teach coding theory, cryptography or electric circuits, when students cannot copy expressions from textbooks, and have to spend a lot of time with debugging standard formulas converted to functional form!

I meant, I did not expect the Julia developers to support "fringe applications", when they have other priorities. Julia is not "unusable", only other languages "better fit these applications".

LaszloHars on 17 Feb 2019

Try const ($) = xor. $ is parsed as infix but currently does not have a standard definition.

JeffBezanson on 17 Feb 2019

Also, do any languages have ASCII infix cyclic bit shifts? That's certainly uncommon among general purpose languages.

JeffBezanson on 17 Feb 2019

Try const ($) = xor. $ is parsed as infix but currently does not have a standard definition.

Thanks, Jeff, for the suggestion. It works for now, but I was reluctant to use an undocumented symbol, with undocumented behavior (associativity, precedence...), which may not live thru the next major Julia release. You are also correct about circular shifts, which I included in the discussion for its utility, only.

I believe that the core language should not have Unicode operators or keywords. For our applications, a mechanism for user-defined infix (and maybe prefix, postfix) operators would be perfect. It may not be easy, though, because of the need for specifying associativity and precedence.

LaszloHars on 17 Feb 2019

I would not worry about 2.0 --- it is years away. I also think it is very unlikely we would stop allowing $ as an infix operator, and even if we did the change would have to go through our normal deprecation process. We must effectively treat all syntax as "public".

We do in fact have a strict policy that any Unicode in Base needs to have ASCII aliases. Of course, that cannot extend to infix, because Unicode has hundreds of operator characters that ASCII lacks. We could simply delete the line const ⊻ = xor and then there would be no need for Unicode to come up in this discussion. It also may be possible to allow anything to be infix, i.e. x xor y, but that would likely be a significant change.

Do you have suggestions of ASCII sequences to add as operators? If there is a character sequence that would make a reasonable operator and does not parse currently, we can consider adding it.

JeffBezanson on 18 Feb 2019

There are many reasonable 2-char infix operators, some of which I mentioned already: >< for xor; and <> for circular shift. The later could also be represented by *> and <* indicating directions. The symbols, which are not unary operators can be prefixed by other symbols, like
*/
\/*
\/>
\/<...
I have also used /\ for intersection (slash-backslash), and \/ for union.

LaszloHars on 18 Feb 2019

I honestly think that having no ASCII alternative for a Unicode operator is an absolute nightmare for general users. We cannot assume that their editors support inputting such characters.

This is a cool but torturous gimmick at best.