Hi,
I would like to be able to retrieve a span from a doc given its character offsets. I've seen that I'm not the only one: #1050 :). In this closed issue @honnibal, you said that there already is a function to do that in cython which is not exposed. Looking at the code I think it is quite straightforward using the functions token_by_start/ token_by_end, already used in the doc.merge method.
So my question is, is it the right way to do it ? And is it something you still want to add ? If so I would be happy to help :)
Thanks,
Thomas
spaCy v2.0.0a10 has introduced a new method, doc.char_span(start, end, label=0, vector=None) to take care of this.
Thanks!
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
spaCy v2.0.0a10 has introduced a new method,
doc.char_span(start, end, label=0, vector=None)to take care of this.Thanks!