Spacy: Can this error raising be lifted?

Created on 30 Mar 2020 · 13Comments · Source: explosion/spaCy

In doc.pyx' s line 590:

 if not self.is_parsed:
            raise ValueError(Errors.E029)

I can still do a good job of chunking by tokenization and pos tagging only, without the full parse. Also in some languages parse isn't available. This will leave more flexibilities to users. I can comment out this in my copy of spacy, but when I update spacy to a new release, I have to change it again.

It would be great if this error can be lifted.

feat / parser feat / ux help wanted (easy)

Source

lingvisa

All 13 comments

I think it would be fine to move this check into each individual noun chunks iterator rather than having it in Doc.noun_chunks.

Would you like to submit a PR to make this change? It looks like there are 9 languages that would need to be modified.

adrianeboyd on 30 Mar 2020

How to submit a PR? Have never done that and thanks.

lingvisa on 31 Mar 2020

You can find some basic information here: https://github.com/explosion/spaCy/blob/master/CONTRIBUTING.md#contributing-to-the-code-base

Basically, you'll have to fork (copy) the repo and build it from source. Set up a new virtual environment and git to do this. Then you can make changes on a new branch, test them, commit them, and if you're satisified with the final result, you can go to your branches (for me, this is https://github.com/svlandeg/spaCy/branches) and hit the "New pull request" button on the right next to your local branch (you will only see this button if you own that specific fork). This will create a PR against spacy's master branch which we can then review.

svlandeg on 31 Mar 2020

My implantation now actually depends on this updated release recently:
https://github.com/howl-anderson/Chinese_models_for_SpaCy

And I am talking to the author, and we think it would be nice to make that model official in Spacy for Chinese. Then we can improve quality of some models in it, when we start to use it. So I would rather wait until it's pushed into the codebase. My previous version depends on Jieba pos tagger, but I would rather like using the full-fledged Chinese model.

lingvisa on 31 Mar 2020

👍1

Hi @svlandeg and @adrianeboyd! I would like to pick this up if no one is actively working on it.

vishnupriyavr on 24 Apr 2020

Sure, I don't think anyone is currently working on this. Moving the check to the individual noun chunks iterators, as Adriane suggested, should be straightforward to do, irrespective of potential other changes to Chinese.

svlandeg on 24 Apr 2020

I think Adriane is working on a full Chinese model release and that would be a better time to work on this after that release. Jieba's POS tagging is shaky.

lingvisa on 24 Apr 2020

This change doesn't really interact with the Chinese model development, so it would be totally fine to start working on it now.

adrianeboyd on 25 Apr 2020

Thanks, I am working on it and will keep you posted.

vishnupriyavr on 27 Apr 2020

Hi @adrianeboyd and @svlandeg, I have created a PR #5396 to enable noun_chunks for specific languages.
Please review and share the feedback when you get a chance. :)

vishnupriyavr on 3 May 2020

❤1

Hi @adrianeboyd, Thanks for reviewing and merging the PR! 👍 Is this issue good to be closed?

vishnupriyavr on 15 May 2020

Yep, this can be closed, as the PR is merged :-)

By the way @vishnupriyavr, as a small tip for next time, if you put something like Fixes #5526 in the description of the PR, the corresponding issue would close automatically when the PR gets merged ;-)

svlandeg on 15 May 2020

Hi @svlandeg, that's a very informative tip! Will keep it in mind for the next time 😊

vishnupriyavr on 15 May 2020

Was this page helpful?

0 / 5 - 0 ratings