Gensim: Feature proposal: model trimming

Created on 13 Mar 2019 · 3Comments · Source: RaRe-Technologies/gensim

Problem: In production, gensim models typically used as "text embedding tool" for any kind of upstream models (for the target supervised task). In production, it's very important to use as less as possible resources (esp. RAM). Unfortunately, huge gensim models consume too much of RAM even if we use it only for inference (no more training).

Idea: Decrease RAM usage, based on 2 simple things

cutting out some information (attributes) from model that don't used for inference (i.e. we still can use inference, but no more training)
"downgrade" matrices dtype, like float64 -> float32.

Implementation draft:
I propose to add an function trim_model to gensim.utils (instead of adding some methods for models), this implementation can be easily extended to other models if needed and don't overcomplicate existing models (because this is fully separate)

This is just a first draft, comments are welcome

import warnings

from gensim.models import TfidfModel, LdaModel, LsiModel
from gensim.models.lsimodel import Projection

import numpy as np


def trim_model(model, dtype=None):
    """Reduce amount of memory used by model.

    This function mutates passed `model`: delete/cast/null attributes,
    after this operation, you can't update your model, but model size is significantly less.

    model: {:class:`~gensim.models.tfidf.TfidfModel`, :class:`~gensim.models.ldamodel.LdaModel`, 
            :class:`~gensim.models.lsimodel.LsiModel`}
        Supported gensim model.
    dtype: np.dtype
        Datatype used for casting attributes, if None - don't cast attributes.

    """
    attr2remove = []  # delete this attributes from model
    attr2null = []  # null this attributes, i.e. model.X = None (required for save/load compatibility)
    attr2cast = []  # cast attributes to lower type

    if isinstance(model, TfidfModel):
        attr2remove.append("id2word")
        attr2cast.append("idfs")

    elif isinstance(model, LdaModel):
        if dtype:
            model.dtype = dtype

        attr2null.extend(["state", "id2word"])
        attr2cast.extend(["alpha", "eta", "expElogbeta"])

    elif isinstance(model, LsiModel):
        if dtype:
            model.dtype = dtype
        attr2remove.append("id2word")
        attr2cast.append("projection")
    else:
        raise RuntimeError("model {} not supported yet".format(model))


    for attr in attr2remove:
        if not hasattr(model, attr):
            warnings.warn("Model {} have no attribute {} marked to delete".format(model, attr))
            continue
        delattr(model, attr)

    for attr in attr2cast:
        if dtype is None:
            continue  # don't cast without explicit type passing

        if not hasattr(model, attr):
            warnings.warn("Model {} have no attribute {} marked to cast".format(model, attr))
            continue

        attr_mat = getattr(model, attr)

        if isinstance(attr_mat, dict):
            for k, v in attr_mat.items():
                attr_mat[k] = dtype(v)

        elif isinstance(attr_mat, np.ndarray):
            setattr(model, attr, attr_mat.astype(dtype))

        elif isinstance(attr_mat, Projection):  # special case for LSI internal attributes, internal means model.a.b
            setattr(attr_mat, "u", attr_mat.u.astype(dtype))
            setattr(attr_mat, "s", attr_mat.s.astype(dtype))
            setattr(model, attr, attr_mat)

    for attr in attr2null:
        if not hasattr(model, attr):
            warnings.warn("Model {} have no attribute {} marked to nullify".format(model, attr))
            continue

        setattr(model, attr, None)

Benchmark:
That's results I received on my production models (I measurement model size on disk for simplicity, measurements for RAM is comparable).
Original model dtype is np.float64, dtype=None mean that we don't cast data types (stay it as is).

| Model | Original size, MB| Trimmed (dtype=None), MB | Trimmed (dtype=np.float32), MB|
|-------|------------------|-----------------------------|----------------------------------|
| TfidfModel | 85.05 | 50.67, x1.678| 46.59, x1.825|
| LdaModel | 6518.85 | 3242.96, x2.010| 1619.86, x4.024|
| LsiModel | 3258.23 | 3233.25, x1.007| 1616.63, x2.015 |

feature

Source

menshikh-iv

Most helpful comment

@mpenkov agree, trim_model sounds better than "compress", renamed, anything else?
@piskvorky wdyt?

menshikh-iv on 14 Mar 2019

👍2

All 3 comments

I like it!

I would change is the word "compression" to something like "trim" to be more accurate, but that's really a nitpick.

mpenkov on 13 Mar 2019

🎉1

@mpenkov agree, trim_model sounds better than "compress", renamed, anything else?
@piskvorky wdyt?

menshikh-iv on 14 Mar 2019

👍2

Beware: I've seen in experiments on the Word2Vec side of things indications that some of the native/optimized array routines up-convert float16 to float32 before bulk operations. So you may get RAM savings while the model is "at rest", but as soon as there's a big operation against the full arrays, (1) a temporary larger copy is made, momentarily turning the memory advantage into a net disadvantage; (2) operations may be slower due to the extra internal conversions.