Spacy: Too many labels result in a crash

Created on 25 Sep 2018  路  2Comments  路  Source: explosion/spaCy

Hi, I'm currently trying to train a custom model with over 125 labels and I encounter the following error:

Windows 10

Process finished with exit code -1073740791 (0xC0000409)

Ubuntu 18.04

*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)

There seems to be a limit. Under 125 labels it works and over it, it crashes.

How to reproduce the behaviour

def __train_model(self, train_data, entity_types):
    nlp = spacy.blank("en")

    ner = nlp.create_pipe("ner")
    nlp.add_pipe(ner)

    for entity_type in list(entity_types):
        ner.add_label(entity_type)

    optimizer = nlp.begin_training()

    # Start training
    for i in range(20):
        losses = {}
        index = 0
        random.shuffle(train_data)

        for statement, entities in train_data:
            nlp.update([statement], [entities], sgd=optimizer, losses=losses, drop=0.5)

    return nlp

Unit Test

    def test_train_with_max_supported_entity_types(self):
        train_data = TrainData()
        train_data.extend([("One sentence", {"entities": []})])
        entity_types = {i for i in range(125)}

        model = self.train_model_processor.train(train_data, entity_types)

        assert_is_not_none(model)

So in the unit test whenever entity_types length is beyond 125, it crashes.

Your Environment

  • spaCy version: 2.0.12
  • Platform: Windows-10-10.0.16299-SP0
  • Python version: 3.7.0

  • Environment Information:
    16gb RAM, CPU: i7-3630QM

Any idea if there is a limit of labels ? If so, should it return an error message describing the error instead of crashing ?

bug feat / ner training

Most helpful comment

~Trying to reproduce this now, but at first glance it looks like the problem is that your labels are integers, where they should be either strings, or the hash of those strings. The integer 125 is going to resolve to one of the reserved symbols, and I think that's what's confusing it.~

Edit: Aaah, nevermind. I found a place in the code where I'd lazily used a stack-allocated array during development, and had not replaced it. Apologies for the inconvenience, and thanks for the test case.

All 2 comments

~Trying to reproduce this now, but at first glance it looks like the problem is that your labels are integers, where they should be either strings, or the hash of those strings. The integer 125 is going to resolve to one of the reserved symbols, and I think that's what's confusing it.~

Edit: Aaah, nevermind. I found a place in the code where I'd lazily used a stack-allocated array during development, and had not replaced it. Apologies for the inconvenience, and thanks for the test case.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings