Spacy: Segmentation fault in spaCy 2.0.5 / python 3.5

Created on 21 Dec 2017 · 6Comments · Source: explosion/spaCy

spaCy 2.0.5 is throwing a core dump. No core dump from this code was seen when using spaCy 1.6, 1.7.5, 1.8.2. I have seen the core dump happen whether or not I am running the debugger.

Here is the complete code that causes the core dump:

#!/usr/bin/env python3

import os
import spacy

language = 'en'

print("Loading Language Model for '%s'..." % language)
nlp = spacy.load(language)
print("Language Model for '%s' loaded." % language)


doc = nlp('Inhalers can be used to treat persistent recurrent asthma')

c = doc[0]
p = None
if c.head != c and c.head != p:
    print('OK')

I see that if I replace "!=" with "is not" then the core dump does not happen.

Info about spaCy

Python version: 3.5.2
spaCy version: 2.0.5
Models: en, en_core_web_md
Platform: Darwin-15.6.0-x86_64-i386-64bit

bug

Source

william-dowling

All 6 comments

Thanks! Passing None into Cython can sometimes cause problems if not caught.

honnibal on 15 Jan 2018

@honnibal I don't think it has solved the issue, sadly.
I am using the same platform (Darwin-17.3.0-x86_64-i386-64bit, macOS basically) and newly added test against segmentation fault is causing a segmentation fault.

Also could be related:
I am getting a segmentation fault when training new models at random. It could happen at any time or not happen at all. The cause is always update method in language.py. And it doesn't matter if the code itself is from examples of how to train a model or from cli/train.py. However, it seems like frequency of segmentation fault increases when there are more training examples (500+).

fucking-signup on 25 Jan 2018

👍1

same problem here,
spacy 2.0.5, mac os, python 3.6.4
trying to learn new NER entities from 5000 of examples using update on the model en_core_web_lg.
It fails randomly with seg fault usually after the second iteration (sometimes making it to the 8th).
Might be related to this (not so) old issue: https://github.com/explosion/spaCy/issues/1335

apierleoni on 29 Jan 2018

👍1

I'm experiencing a similar problem training the NER on anything but a very small set of examples. Training on anything over 1000 examples throws the following error. Is this a memory error?

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Info about spaCy
Python version: 3.6.3
spaCy version: 2.0.5
Models: en, en_core_sm
Platform: MacOS

I note that I got the same error when trying to train using each of (a) the Prodigy ner.batch-train recipe and (b) the regular spacy train_ner.py script.

Example Error messages when running prodigy:

line 1: 41665 Segmentation fault: 11 python -m prodigy "$@"

line 1: 49673 Segmentation fault: 11 python -m prodigy "$@"

nikeqiang on 1 Feb 2018

I'm also experiencing the same issue when training the English NER model. When training on about 100 examples there were no problems, but with 500+ I also get the error: "Segmentation fault: 11"

Environment

Operating System: OS Sierra 10.12.6
Python Version Used: 3.6.4
spaCy Version Used: 2.0.7
Models: en version 2.0.0

The error occurs on nlp.update after 2 or 3 iterations.

other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): #trains only the ner model
        optimizer = nlp.begin_training()
        for itn in range(n_iter):
            random.shuffle(train)
            losses = {}
            for text, annotations in train:
                nlp.update(
                    [text], #batch of texts
                    [annotations], #batch of annotations
                    drop = dropout, #make it harder to memorize data
                    sgd = optimizer, #update weights
                    losses = losses)
            print(losses)