Spacy: ner.add_label to existing model causes segmentation fault: 11

Created on 17 Sep 2018 · 8Comments · Source: explosion/spaCy

I was getting intermittent segmentation faults when training a new entity type, and so I thought I'd update spaCy to see if that helped. Unfortunately, now I get a segfault every single time, except not in training, but on adding entity types.

How to reproduce the behaviour

Follow the spaCy/examples/training/train_new_entity_type.py example with the existing model 'en'. Segmentation fault occurs when adding a new entity label (ner.add_label(label)).

Your Environment

spaCy version: 2.1.0a1
Platform: Darwin-17.7.0-x86_64-i386-64bit
Python version: 3.7.0
Models: en

I've attached the segfault log.

segfault.txt

bug feat / ner

Source

iperera

Most helpful comment

Fixed :tada: 160b55c5729f

honnibal on 10 Dec 2018

👍2 🎉1

All 8 comments

Thanks for the report. Are you able to share the examples you used and/or the labels you're adding? And do you have a reproducible example? Segfaults like this are always tricky to debug, so the more specific examples we have, the better.

ines on 18 Sep 2018

The minimal reproducible example is the train_new_entity_type.py example script with the 'en' model loaded, with no other changes. That script adds the 'ANIMAL' entity tag. Note that this particular error is only with the nightly build.

The intermittent segmentation faults I referenced happened with other data on the release build, but that issue has been mentioned in the past and is still open - #1969

iperera on 18 Sep 2018

@iperera do you get the segfault even when just running that example file? It ran fine for me, on a mac using Python 3.7.

free-variation on 20 Sep 2018

Only when specifying an existing model to add to. If I start with a blank model, it runs fine for me.

iperera on 20 Sep 2018

I also get a segmentation fault using the standard training code when I try add a label to the NER with ner.add_label("FEATURE")

This is on the latest nightly build

def main(model=None, new_model_name='animal', output_dir=None, n_iter=10):
    """Set up the pipeline and entity recognizer, and train the new entity."""
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank('en')  # create blank Language class
        print("Created blank 'en' model")
    # Add entity recognizer to model if it's not in the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy

    print(nlp.pipe_names)
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner)
    # otherwise, get it, so we can add labels to it
    else:
        ner = nlp.get_pipe('ner')

    print("Adding labels")
    for label in LABELS:
        print(label)
        ner.add_label(label)   # <- Segfaults here
        print(label)

    print("Beginning training")
    if model is None:
        optimizer = nlp.begin_training()
    else:
        # Note that 'begin_training' initializes the models, so it'll zero out
        # existing entity types.
        optimizer = nlp.entity.create_optimizer()

    # get names of other pipes to disable them during training
    print("Disabling pipes")
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):  # only train NER
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            losses = {}
            # batch up the examples using spaCy's minibatch
            batches = minibatch(TRAIN_DATA, size=compounding(8., 64., 1.001))
            # print(f'Number of batches: {len(batches)}')
            for batch_num, batch in enumerate(batches):
                texts, annotations = zip(*batch)
                if batch_num % 1000 == 0:
                    print(f"Batch {batch_num}")
                nlp.update(texts, annotations, sgd=optimizer, drop=0.35,
                           losses=losses)
            print('Losses', losses)

    # test the trained model
    test_text = 'Do you like horses?'
    doc = nlp(test_text)
    print("Entities in '%s'" % test_text)
    for ent in doc.ents:
        print(ent.label_, ent.text)

    # save model to output directory
    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
        nlp.meta['name'] = new_model_name  # rename model
        nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

        # test the saved model
        print("Loading from", output_dir)
        nlp2 = spacy.load(output_dir)
        doc2 = nlp2(test_text)
        for ent in doc2.ents:
            print(ent.label_, ent.text)

if __name__ == '__main__':
    main(model='en_core_web_md', new_model_name="feature", output_dir="./new_model", n_iter=1)

nyejon on 22 Nov 2018

@nyejon Thanks for the example! I just tested it on the very latest state of develop and can confirm the segfault.

Here's the minimal reproducable version:

import spacy

nlp = spacy.load("en_core_web_sm")
ner = nlp.get_pipe("ner")
ner.add_label("FEATURE")

ines on 26 Nov 2018

Fixed :tada: 160b55c5729f

honnibal on 10 Dec 2018

👍2 🎉1

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

lock[bot] on 9 Jan 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

PhraseMatcher returns only 1 match while more than 1 rules are verified

cverluise · 3Comments

Compare operator (==) behaves unexpectedly on spacy tokens

ank-26 · 3Comments

Usage Examples return TypeError

besirkurtulmus · 3Comments

💫 Finalise vector support and add vector specs to model meta

ines · 3Comments

EntityLinker, pipes.pyx KeyError: '0_12' using sample code given in guides

curiousgeek0 · 3Comments