Julia: Colored printing of `code_llvm` and `code_native`

Created on 13 Jul 2020  ·  7Comments  ·  Source: JuliaLang/julia

cf. https://discourse.julialang.org/t/why-are-code-llvm-and-code-native-not-displayed-in-color/42959

Recently, the display of stack traces has become colorful (cf. issue #36026, PR #36134). I also think it is helpful to make the output of code_llvm and code_native colorful.

Unregistered package ColorfulCodeGen.jl provides the functionality (thanks to @tkf). However, the ColorfulCodeGen.jl uses an external tool Pygments.

I wondered if huge lists of keywords and/or complicated parsers were needed to color the automatically generated LLVM IR or native assembler codes. So I tried to implement it. You can copy & paste the following PoC to your REPL (Julia v1.4 or later).
Edit: You can get the slightly improved version and code_native version via ColoredLLCodes.jl package now.


PoC for code_llvm

import InteractiveUtils: _dump_function, code_llvm, code_native

const IO_ = Union{Base.AbstractPipe, Base.LibuvStream} # avoid overwriting

llstyle = Dict{Symbol, Tuple{Bool, Union{Symbol, Int}}}(
    :default     => (false, :light_black),
    :comment     => (false, :green),
    :label       => (false, :light_red),
    :instruction => ( true, :light_cyan),
    :type        => (false, :cyan),
    :number      => (false, :yellow),
    :bracket     => (false, :yellow),
    :variable    => (false, :normal),
    :keyword     => (false, :light_magenta),
    :funcname    => (false, :light_yellow),
)

const num_regex = r"^(?:\$?-?\d+|0x[0-9A-F]+|-?(?:\d+\.?\d*|\.\d+)(?:[eE][+-]?\d+)?)$"

function printstyled_ll(io::IO, x, s::Symbol, trailing_spaces="")
    isempty(x) || printstyled(io, x, bold=llstyle[s][1], color=llstyle[s][2])
    print(io, trailing_spaces)
end

function code_llvm(io::IO_, @nospecialize(f), @nospecialize(types),
                   raw::Bool, dump_module::Bool=false, optimize::Bool=true,
                   debuginfo::Symbol=:default)
    d = _dump_function(f, types, false, false, !raw, dump_module, :att, optimize, debuginfo)
    if get(io, :color, false)
        print_llvm(io, d)
    else
        print(io, d)
    end
end

function print_llvm(io::IO, code::String)
    buf = IOBuffer(code)
    for line in eachline(buf)
        m = match(r"^(\s*)((?:[^;]|;\")*)(.*)$", line)
        m === nothing && continue
        indent, tokens, comment = m.captures
        print(io, indent)
        print_llvm_tokens(io, tokens)
        printstyled_ll(io, comment, :comment)
        println(io)
    end
end

const llvm_types =
    r"^(?:void|half|float|double|x86_\w+|ppc_\w+|label|metadata|type|opaque|token|i\d+)$"
const llvm_cond = r"^(?:[ou]?eq|[ou]?ne|[uso][gl][te]|ord|uno|true|false)$"

function print_llvm_tokens(io, line)
    tokens = line
    m = match(r"^((?:[^\s:]+:)?)(\s*)(.*)", tokens)
    if m !== nothing
        label, spaces, tokens = m.captures
        printstyled_ll(io, label, :label, spaces)
    end
    m = match(r"^(%[^\s=]+)(\s*)=(\s*)(.*)", tokens)
    if m !== nothing
        result, spaces, spaces2, tokens = m.captures
        printstyled_ll(io, result, :variable, spaces)
        printstyled_ll(io, '=', :default, spaces2)
    end
    m = match(r"^([a-z]\w*)(\s*)(.*)", tokens)
    if m !== nothing
        inst, spaces, tokens = m.captures
        printstyled_ll(io, inst, inst == "define" ? :keyword : :instruction, spaces)
    end

    print_llvm_operands(io, tokens)
end

function print_llvm_operands(io, tokens)
    while !isempty(tokens)
        tokens = print_llvm_operand(io, tokens)
    end
    return tokens
end

function print_llvm_operand(io, tokens)
    islabel = false
    while !isempty(tokens)
        m = match(r"^,(\s*)(.*)", tokens)
        if m !== nothing
            spaces, tokens = m.captures
            printstyled_ll(io, ',', :default, spaces)
            break
        end
        m = match(r"^(\*+)(\s*)(.*)", tokens)
        if m !== nothing
            asterisks, spaces, tokens = m.captures
            printstyled_ll(io, asterisks, :default, spaces)
            continue
        end
        m = match(r"^([({\[<])(\s*)(.*)", tokens)
        if m !== nothing
            bracket, spaces, tokens = m.captures
            printstyled_ll(io, bracket, :bracket, spaces)
            tokens = print_llvm_operands(io, tokens) # enter
            continue
        end
        m = match(r"^([)}\]>])(\s*)(.*)", tokens)
        if m !== nothing
            bracket, spaces, tokens = m.captures
            printstyled_ll(io, bracket, :bracket, spaces)
            break # leave
        end

        m = match(r"^([^\s,*(){}\[\]<>]+)(\s*)(.*)", tokens)
        m === nothing && break
        token, spaces, tokens = m.captures
        if occursin(llvm_types, token)
            printstyled_ll(io, token, :type)
            islabel = token == "label"
        elseif occursin(llvm_cond, token) # condition code is instruction-level
            printstyled_ll(io, token, :instruction)
        elseif occursin(num_regex, token)
            printstyled_ll(io, token, :number)
        elseif occursin(r"^@.+$", token)
            printstyled_ll(io, token, :funcname)
        elseif occursin(r"^%.+$", token)
            printstyled_ll(io, token, islabel ? :label : :variable)
            islabel = false
        elseif occursin(r"^[a-z]\w+$", token)
            printstyled_ll(io, token, :keyword)
        else
            printstyled_ll(io, token, :default)
        end
        print(io, spaces)
    end
    return tokens
end

code_llvm

Note that the code_llvm and code_native are defined in InteractiveUtils (stdlib), not Base. Also, please do not start a bicycle shed discussion on color schemes before we have a firm idea of ​​where and how to implement this feature. :sweat_smile:

I can continue to improve the PoC in a separate package, but the value of such a heuristic approach (i.e. the motivation) will be diminished.

Most helpful comment

Beautiful printing with such a minimal code, @kimikage!

As a general idea, it might be useful for code_llvm etc. to return a lazy object with a show defined on text/plain. This way, we can implement syntax highlighting using external packages like Highlights.jl. Also, in frontends with a rich output system like Jupyter, you might want, e.g., text/html output with clickable links to files.

Having said that, maybe syntax highlighting for LLVM/ASM is simple enough to live in InteractiveUtils?

All 7 comments

Beautiful printing with such a minimal code, @kimikage!

As a general idea, it might be useful for code_llvm etc. to return a lazy object with a show defined on text/plain. This way, we can implement syntax highlighting using external packages like Highlights.jl. Also, in frontends with a rich output system like Jupyter, you might want, e.g., text/html output with clickable links to files.

Having said that, maybe syntax highlighting for LLVM/ASM is simple enough to live in InteractiveUtils?

BTW, as a more minimal improvement, there is also an option to change the color of comments only. As for x86 ASM, I'm quite happy with that. However, as for LLVM IR, it is still difficult to read. :sweat_smile:

I can't decide if the highlighting feature should be in InteractiveUtils, in a separate package, or in another existing package (e.g. Cthulhu.jl) for now.
However, to write its tests, I first created a public repository: https://github.com/kimikage/ColoredLLCodes.jl

You can see the examples in the GitHub Actions log.
LLVM IR:
https://github.com/kimikage/ColoredLLCodes.jl/runs/869771001?check_suite_focus=true#step:6:17
x86 ASM (AT&T / Intel):
https://github.com/kimikage/ColoredLLCodes.jl/runs/869771001?check_suite_focus=true#step:6:593

Also in the Travis CI log,
ARM ASM:
https://travis-ci.com/github/kimikage/ColoredLLCodes.jl/jobs/362360427#L444
(Please note that the Travis logs sometimes can be garbled.)

I welcome your issue reports.

I think it is a good design for code_llvm etc. to return a lazy object, as @tkf suggests.

However, my concern is that the internal data displayed by code_llvm etc. has less portability. Therefore, I think we can put only "frozen" strings in the lazy object (with some metadata). It's a much smarter way than "stealing" the string via an IOBuffer, but with little benefit in functionality for end-users.

Also, improvements in InteractiveUtils will be rarely backported, so such breaking changes will force some (but perhaps few :sweat_smile:) packages to support backward compatibility.

One of drastic measures is to support portable AST objects for multiple languages, but I think that is beyond the responsibility of stdlibs.

I'm still deciding what to do in this repository.

Edit:
I'm essentially in favor of returning lazy objects, but as I said above, I don't have any concrete idea about the design of the objects. Therefore, I don't plan to submit a PR on that change.

Would love to see this in InteractiveUtils
I think anything right now is better than no syntax highlighting for these functions.

I'm essentially in favor of returning lazy objects, but as I said above, I don't have any concrete idea about the design of the objects. Therefore, I don't plan to submit a PR on that change.

Perhaps then just open a PR with the current change? That doesn't rule out future improvements and until then users can enjoy syntax highlighting 😃

I am in the process of writing test code for LLVM IR. (I wrote the minimal test sets for x86/ARM ASM.)
I will register ColoredLLCodes.jl after I finish writing the test code. (Since it is a new package, the registration will take a few extra days.)

After that, I plan to submit a PR to port the functionality to InteractiveUtils.

I seriously considered maintaining the code in Cthulhu.jl, but I decided it would be worth registering a separate package because I only load Cthulhu.jl when I need it. Of course, by registering the package Cthulhu.jl can use its internal functions.

Edit:
https://github.com/JuliaRegistries/General/pull/19104

ColoredLLCodes.jl has been registered, and I submitted PR #36984.

Was this page helpful?
0 / 5 - 0 ratings