Julia: no method matching isvalid(::SubString{String})

Created on 19 Oct 2017  路  13Comments  路  Source: JuliaLang/julia

Although the docs say "Char" or "String"

Returns true if the given value is valid for that type. Types currently can be either Char or String.

SubStrings should be accepted as well, but they aren't.

isvalid(lstrip(" blabla"))
ERROR: MethodError: no method matching isvalid(::SubString{String})
Closest candidates are:
  isvalid(::SubString, ::Integer) at strings/types.jl:70
  isvalid(::AbstractString, ::Integer) at strings/basic.jl:228
  isvalid(::Char) at strings/utf8proc.jl:56
Hacktoberfest good first issue strings

Most helpful comment

Looks pretty easy to extend the current code to handle this. Just widen the signature of byte_string_classify(s::String) to

byte_string_classify(s::Union{String,SubString{String},Vector{UInt8}}) =
    ccall(:u8_isvalid, Int32, (Ptr{UInt8}, Int), s, sizeof(s))

deleting the redundant byte_string_classify(data::Vector{UInt8}) method,
and then widen the signature of isvalid(::Type{String}, s::Union{Vector{UInt8},String}) to allow SubString{String}.

All 13 comments

Looks pretty easy to extend the current code to handle this. Just widen the signature of byte_string_classify(s::String) to

byte_string_classify(s::Union{String,SubString{String},Vector{UInt8}}) =
    ccall(:u8_isvalid, Int32, (Ptr{UInt8}, Int), s, sizeof(s))

deleting the redundant byte_string_classify(data::Vector{UInt8}) method,
and then widen the signature of isvalid(::Type{String}, s::Union{Vector{UInt8},String}) to allow SubString{String}.

If you add it, please also add a test.

In general, it looks like substrings are pretty broken for invalid strings:

julia> s = String(UInt8[0xfe, 0x80, 0x80, 0x80, 0x80, 0x80])
6-byte String of invalid UTF-8 data:
 0xfe
 0x80
 0x80
 0x80
 0x80
 0x80

julia> ss = SubString(s, 1, 2)
"\U80000000"

julia> String(ss)
6-byte String of invalid UTF-8 data:
 0xfe
 0x80
 0x80
 0x80
 0x80
 0x80

@bkamins ^

@stevengj Are you testing this on latest? The result you give is present on 0.6.
Now it should be fixed and ss = SubString(s, 1, 2) should throw BoundsError (if it does not please let me know as this is the expected behavior).

Indeed, it's great that your recent changes fixed this kind of bug:

julia> s = String(UInt8[0xfe, 0x80, 0x80, 0x80, 0x80, 0x80])
6-byte String of invalid UTF-8 data:
 0xfe
 0x80
 0x80
 0x80
 0x80
 0x80

julia> ss = SubString(s, 1, 2)
ERROR: BoundsError: attempt to access 6-byte String of invalid UTF-8 data:
 0xfe
 0x80
 0x80
 0x80
 0x80
 0x80
  at index [2]
Stacktrace:
 [1] SubString{String}(::String, ::Int64, ::Int64) at ./strings/types.jl:33
 [2] SubString(::String, ::Int64, ::Int64) at ./strings/types.jl:38

Bump

You're welcome to work on it, but given that it's a small improvement which can be added in any 1.x release, it may not get attention from core developers in the short term.

But isn't this fixed by #24132? It only needs to be merged, I think.

Ah, you mean https://github.com/JuliaLang/julia/pull/25285? I've rebased it, let's see how it goes. FWIW, better bump the PR than the issue next time.

Is this issue still open? I would like to work on this issue.

@raghav9-97 You could start from https://github.com/JuliaLang/julia/pull/25285 and check what's needed to fix the failures.

Thanks for the reference @nalimilan

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tkoolen picture tkoolen  路  3Comments

omus picture omus  路  3Comments

StefanKarpinski picture StefanKarpinski  路  3Comments

helgee picture helgee  路  3Comments

sbromberger picture sbromberger  路  3Comments