Problem: In production, gensim models typically used as "text embedding tool" for any kind of upstream models (for the target supervised task). In production, it's very important to use as less as possible resources (esp. RAM). Unfortunately, huge gensim models consume too much of RAM even if we use it only for inference (no more training).
Idea: Decrease RAM usage, based on 2 simple things
float64 -> float32.Implementation draft:
I propose to add an function trim_model to gensim.utils (instead of adding some methods for models), this implementation can be easily extended to other models if needed and don't overcomplicate existing models (because this is fully separate)
This is just a first draft, comments are welcome
import warnings
from gensim.models import TfidfModel, LdaModel, LsiModel
from gensim.models.lsimodel import Projection
import numpy as np
def trim_model(model, dtype=None):
"""Reduce amount of memory used by model.
This function mutates passed `model`: delete/cast/null attributes,
after this operation, you can't update your model, but model size is significantly less.
model: {:class:`~gensim.models.tfidf.TfidfModel`, :class:`~gensim.models.ldamodel.LdaModel`,
:class:`~gensim.models.lsimodel.LsiModel`}
Supported gensim model.
dtype: np.dtype
Datatype used for casting attributes, if None - don't cast attributes.
"""
attr2remove = [] # delete this attributes from model
attr2null = [] # null this attributes, i.e. model.X = None (required for save/load compatibility)
attr2cast = [] # cast attributes to lower type
if isinstance(model, TfidfModel):
attr2remove.append("id2word")
attr2cast.append("idfs")
elif isinstance(model, LdaModel):
if dtype:
model.dtype = dtype
attr2null.extend(["state", "id2word"])
attr2cast.extend(["alpha", "eta", "expElogbeta"])
elif isinstance(model, LsiModel):
if dtype:
model.dtype = dtype
attr2remove.append("id2word")
attr2cast.append("projection")
else:
raise RuntimeError("model {} not supported yet".format(model))
for attr in attr2remove:
if not hasattr(model, attr):
warnings.warn("Model {} have no attribute {} marked to delete".format(model, attr))
continue
delattr(model, attr)
for attr in attr2cast:
if dtype is None:
continue # don't cast without explicit type passing
if not hasattr(model, attr):
warnings.warn("Model {} have no attribute {} marked to cast".format(model, attr))
continue
attr_mat = getattr(model, attr)
if isinstance(attr_mat, dict):
for k, v in attr_mat.items():
attr_mat[k] = dtype(v)
elif isinstance(attr_mat, np.ndarray):
setattr(model, attr, attr_mat.astype(dtype))
elif isinstance(attr_mat, Projection): # special case for LSI internal attributes, internal means model.a.b
setattr(attr_mat, "u", attr_mat.u.astype(dtype))
setattr(attr_mat, "s", attr_mat.s.astype(dtype))
setattr(model, attr, attr_mat)
for attr in attr2null:
if not hasattr(model, attr):
warnings.warn("Model {} have no attribute {} marked to nullify".format(model, attr))
continue
setattr(model, attr, None)
Benchmark:
That's results I received on my production models (I measurement model size on disk for simplicity, measurements for RAM is comparable).
Original model dtype is np.float64, dtype=None mean that we don't cast data types (stay it as is).
| Model | Original size, MB| Trimmed (dtype=None), MB | Trimmed (dtype=np.float32), MB|
|-------|------------------|-----------------------------|----------------------------------|
| TfidfModel | 85.05 | 50.67, x1.678| 46.59, x1.825|
| LdaModel | 6518.85 | 3242.96, x2.010| 1619.86, x4.024|
| LsiModel | 3258.23 | 3233.25, x1.007| 1616.63, x2.015 |
I like it!
I would change is the word "compression" to something like "trim" to be more accurate, but that's really a nitpick.
@mpenkov agree, trim_model sounds better than "compress", renamed, anything else?
@piskvorky wdyt?
Beware: I've seen in experiments on the Word2Vec side of things indications that some of the native/optimized array routines up-convert float16 to float32 before bulk operations. So you may get RAM savings while the model is "at rest", but as soon as there's a big operation against the full arrays, (1) a temporary larger copy is made, momentarily turning the memory advantage into a net disadvantage; (2) operations may be slower due to the extra internal conversions.
Most helpful comment
@mpenkov agree,
trim_modelsounds better than "compress", renamed, anything else?@piskvorky wdyt?