The area of the library is the model. There are indeed description on how to add a new language to Spacy, but because I am quite new to Spacy those process seems quite daunting for me so I wonder if there's a plan to add Finnish to the models existing in Spacy or its already under development.
Yes. If other's work on it as well.
The Nordic languages are definitely high on our list. The Finnish language data in spaCy is still a bit sparse, so there might be a few things that need to be improved before we can train a model.
The process requires the following steps and components:
NOUN and optional morphological features.spacy convert command that take .conllu files and output spaCy's JSON format. See here for an example of a training pipeline with data conversion. Corpora can have very subtle formatting differences, so it's important to check that they can be converted correctly.spacy train to train a new model.With our new internal model training infrastructure, it's now much easier for us to integrate new pipelines and train new models. In order to train and distribute "official" spaCy models, we need to be able to integrate and reproduce the full training pipeline whenever we release a new version of spaCy that requires new models (so we can't just upload a model trained by someone else).
But this also means that users can contribute by sharing their data conversion and training commands. So if you end up experimenting with the Finnish Universal Dependencies treebank and find an approach that works, that'd be super cool 🙂
Thanks for your response to my question. Like I said, I am so new to models, nlp, training, pipeline terms but I have indeed checked the Dependencies Treebank and also a demo page that shows word similarities. If I am able to lay my hand on this I'll report here or make a new issue otherwise It'd be nice if someone is already working on it.
Just keeping an eye on this. Just in case an update pops up, I'll be glad to know =)
Me too :dancer:
I don't know if you are aware, I suppose you are, but there is a Universal Dependency Tree under a Creative Commons license for the Finnish language. I think it has been developed by the University of Turku, could that be used?
Merging this with the master thread in #3056!
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.