cf. https://discourse.julialang.org/t/why-are-code-llvm-and-code-native-not-displayed-in-color/42959
Recently, the display of stack traces has become colorful (cf. issue #36026, PR #36134). I also think it is helpful to make the output of code_llvm
and code_native
colorful.
Unregistered package ColorfulCodeGen.jl provides the functionality (thanks to @tkf). However, the ColorfulCodeGen.jl uses an external tool Pygments.
I wondered if huge lists of keywords and/or complicated parsers were needed to color the automatically generated LLVM IR or native assembler codes. So I tried to implement it. You can copy & paste the following PoC to your REPL (Julia v1.4 or later).
Edit: You can get the slightly improved version and code_native
version via ColoredLLCodes.jl package now.
PoC for
code_llvm
import InteractiveUtils: _dump_function, code_llvm, code_native
const IO_ = Union{Base.AbstractPipe, Base.LibuvStream} # avoid overwriting
llstyle = Dict{Symbol, Tuple{Bool, Union{Symbol, Int}}}(
:default => (false, :light_black),
:comment => (false, :green),
:label => (false, :light_red),
:instruction => ( true, :light_cyan),
:type => (false, :cyan),
:number => (false, :yellow),
:bracket => (false, :yellow),
:variable => (false, :normal),
:keyword => (false, :light_magenta),
:funcname => (false, :light_yellow),
)
const num_regex = r"^(?:\$?-?\d+|0x[0-9A-F]+|-?(?:\d+\.?\d*|\.\d+)(?:[eE][+-]?\d+)?)$"
function printstyled_ll(io::IO, x, s::Symbol, trailing_spaces="")
isempty(x) || printstyled(io, x, bold=llstyle[s][1], color=llstyle[s][2])
print(io, trailing_spaces)
end
function code_llvm(io::IO_, @nospecialize(f), @nospecialize(types),
raw::Bool, dump_module::Bool=false, optimize::Bool=true,
debuginfo::Symbol=:default)
d = _dump_function(f, types, false, false, !raw, dump_module, :att, optimize, debuginfo)
if get(io, :color, false)
print_llvm(io, d)
else
print(io, d)
end
end
function print_llvm(io::IO, code::String)
buf = IOBuffer(code)
for line in eachline(buf)
m = match(r"^(\s*)((?:[^;]|;\")*)(.*)$", line)
m === nothing && continue
indent, tokens, comment = m.captures
print(io, indent)
print_llvm_tokens(io, tokens)
printstyled_ll(io, comment, :comment)
println(io)
end
end
const llvm_types =
r"^(?:void|half|float|double|x86_\w+|ppc_\w+|label|metadata|type|opaque|token|i\d+)$"
const llvm_cond = r"^(?:[ou]?eq|[ou]?ne|[uso][gl][te]|ord|uno|true|false)$"
function print_llvm_tokens(io, line)
tokens = line
m = match(r"^((?:[^\s:]+:)?)(\s*)(.*)", tokens)
if m !== nothing
label, spaces, tokens = m.captures
printstyled_ll(io, label, :label, spaces)
end
m = match(r"^(%[^\s=]+)(\s*)=(\s*)(.*)", tokens)
if m !== nothing
result, spaces, spaces2, tokens = m.captures
printstyled_ll(io, result, :variable, spaces)
printstyled_ll(io, '=', :default, spaces2)
end
m = match(r"^([a-z]\w*)(\s*)(.*)", tokens)
if m !== nothing
inst, spaces, tokens = m.captures
printstyled_ll(io, inst, inst == "define" ? :keyword : :instruction, spaces)
end
print_llvm_operands(io, tokens)
end
function print_llvm_operands(io, tokens)
while !isempty(tokens)
tokens = print_llvm_operand(io, tokens)
end
return tokens
end
function print_llvm_operand(io, tokens)
islabel = false
while !isempty(tokens)
m = match(r"^,(\s*)(.*)", tokens)
if m !== nothing
spaces, tokens = m.captures
printstyled_ll(io, ',', :default, spaces)
break
end
m = match(r"^(\*+)(\s*)(.*)", tokens)
if m !== nothing
asterisks, spaces, tokens = m.captures
printstyled_ll(io, asterisks, :default, spaces)
continue
end
m = match(r"^([({\[<])(\s*)(.*)", tokens)
if m !== nothing
bracket, spaces, tokens = m.captures
printstyled_ll(io, bracket, :bracket, spaces)
tokens = print_llvm_operands(io, tokens) # enter
continue
end
m = match(r"^([)}\]>])(\s*)(.*)", tokens)
if m !== nothing
bracket, spaces, tokens = m.captures
printstyled_ll(io, bracket, :bracket, spaces)
break # leave
end
m = match(r"^([^\s,*(){}\[\]<>]+)(\s*)(.*)", tokens)
m === nothing && break
token, spaces, tokens = m.captures
if occursin(llvm_types, token)
printstyled_ll(io, token, :type)
islabel = token == "label"
elseif occursin(llvm_cond, token) # condition code is instruction-level
printstyled_ll(io, token, :instruction)
elseif occursin(num_regex, token)
printstyled_ll(io, token, :number)
elseif occursin(r"^@.+$", token)
printstyled_ll(io, token, :funcname)
elseif occursin(r"^%.+$", token)
printstyled_ll(io, token, islabel ? :label : :variable)
islabel = false
elseif occursin(r"^[a-z]\w+$", token)
printstyled_ll(io, token, :keyword)
else
printstyled_ll(io, token, :default)
end
print(io, spaces)
end
return tokens
end
Note that the code_llvm
and code_native
are defined in InteractiveUtils
(stdlib), not Base
. Also, please do not start a bicycle shed discussion on color schemes before we have a firm idea of where and how to implement this feature. :sweat_smile:
I can continue to improve the PoC in a separate package, but the value of such a heuristic approach (i.e. the motivation) will be diminished.
Beautiful printing with such a minimal code, @kimikage!
As a general idea, it might be useful for code_llvm
etc. to return a lazy object with a show
defined on text/plain. This way, we can implement syntax highlighting using external packages like Highlights.jl. Also, in frontends with a rich output system like Jupyter, you might want, e.g., text/html output with clickable links to files.
Having said that, maybe syntax highlighting for LLVM/ASM is simple enough to live in InteractiveUtils?
BTW, as a more minimal improvement, there is also an option to change the color of comments only. As for x86 ASM, I'm quite happy with that. However, as for LLVM IR, it is still difficult to read. :sweat_smile:
I can't decide if the highlighting feature should be in InteractiveUtils
, in a separate package, or in another existing package (e.g. Cthulhu.jl
) for now.
However, to write its tests, I first created a public repository: https://github.com/kimikage/ColoredLLCodes.jl
You can see the examples in the GitHub Actions log.
LLVM IR:
https://github.com/kimikage/ColoredLLCodes.jl/runs/869771001?check_suite_focus=true#step:6:17
x86 ASM (AT&T / Intel):
https://github.com/kimikage/ColoredLLCodes.jl/runs/869771001?check_suite_focus=true#step:6:593
Also in the Travis CI log,
ARM ASM:
https://travis-ci.com/github/kimikage/ColoredLLCodes.jl/jobs/362360427#L444
(Please note that the Travis logs sometimes can be garbled.)
I welcome your issue reports.
I think it is a good design for code_llvm
etc. to return a lazy object, as @tkf suggests.
However, my concern is that the internal data displayed by code_llvm
etc. has less portability. Therefore, I think we can put only "frozen" strings in the lazy object (with some metadata). It's a much smarter way than "stealing" the string via an IOBuffer
, but with little benefit in functionality for end-users.
Also, improvements in InteractiveUtils
will be rarely backported, so such breaking changes will force some (but perhaps few :sweat_smile:) packages to support backward compatibility.
One of drastic measures is to support portable AST objects for multiple languages, but I think that is beyond the responsibility of stdlib
s.
I'm still deciding what to do in this repository.
Edit:
I'm essentially in favor of returning lazy objects, but as I said above, I don't have any concrete idea about the design of the objects. Therefore, I don't plan to submit a PR on that change.
Would love to see this in InteractiveUtils
I think anything right now is better than no syntax highlighting for these functions.
I'm essentially in favor of returning lazy objects, but as I said above, I don't have any concrete idea about the design of the objects. Therefore, I don't plan to submit a PR on that change.
Perhaps then just open a PR with the current change? That doesn't rule out future improvements and until then users can enjoy syntax highlighting 😃
I am in the process of writing test code for LLVM IR. (I wrote the minimal test sets for x86/ARM ASM.)
I will register ColoredLLCodes.jl
after I finish writing the test code. (Since it is a new package, the registration will take a few extra days.)
After that, I plan to submit a PR to port the functionality to InteractiveUtils
.
I seriously considered maintaining the code in Cthulhu.jl
, but I decided it would be worth registering a separate package because I only load Cthulhu.jl
when I need it. Of course, by registering the package Cthulhu.jl
can use its internal functions.
ColoredLLCodes.jl
has been registered, and I submitted PR #36984.
Most helpful comment
Beautiful printing with such a minimal code, @kimikage!
As a general idea, it might be useful for
code_llvm
etc. to return a lazy object with ashow
defined on text/plain. This way, we can implement syntax highlighting using external packages like Highlights.jl. Also, in frontends with a rich output system like Jupyter, you might want, e.g., text/html output with clickable links to files.Having said that, maybe syntax highlighting for LLVM/ASM is simple enough to live in InteractiveUtils?