Spacy: Get span from character offsets

Created on 16 Aug 2017  路  2Comments  路  Source: explosion/spaCy

Hi,

I would like to be able to retrieve a span from a doc given its character offsets. I've seen that I'm not the only one: #1050 :). In this closed issue @honnibal, you said that there already is a function to do that in cython which is not exposed. Looking at the code I think it is quite straightforward using the functions token_by_start/ token_by_end, already used in the doc.merge method.

So my question is, is it the right way to do it ? And is it something you still want to add ? If so I would be happy to help :)

Thanks,

Thomas

enhancement

Most helpful comment

spaCy v2.0.0a10 has introduced a new method, doc.char_span(start, end, label=0, vector=None) to take care of this.

Thanks!

All 2 comments

spaCy v2.0.0a10 has introduced a new method, doc.char_span(start, end, label=0, vector=None) to take care of this.

Thanks!

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings