Spacy: Get span from character offsets

Created on 16 Aug 2017 · 2Comments · Source: explosion/spaCy

Hi,

I would like to be able to retrieve a span from a doc given its character offsets. I've seen that I'm not the only one: #1050 :). In this closed issue @honnibal, you said that there already is a function to do that in cython which is not exposed. Looking at the code I think it is quite straightforward using the functions token_by_start/ token_by_end, already used in the doc.merge method.

So my question is, is it the right way to do it ? And is it something you still want to add ? If so I would be happy to help :)

Thanks,

Thomas

enhancement

Source

thomasopsomer

Most helpful comment

spaCy v2.0.0a10 has introduced a new method, doc.char_span(start, end, label=0, vector=None) to take care of this.

Thanks!

honnibal on 27 Aug 2017

👍5

All 2 comments

spaCy v2.0.0a10 has introduced a new method, doc.char_span(start, end, label=0, vector=None) to take care of this.

Thanks!

honnibal on 27 Aug 2017

👍5

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

lock[bot] on 8 May 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

DocBin.to_bytes fails for empty DocBin

notnami · 3Comments

High similarity scores for antonyms

ajayrfhp · 3Comments

Usage Examples return TypeError

besirkurtulmus · 3Comments

tag every token from the matched sentence

nadachaabani1 · 3Comments

Details/paper used for recent NER implementation

muzaluisa · 3Comments