Chatterbot: Found why best_match has low performance with Levenshtein distance comparision

Created on 10 Jan 2018 · 8Comments · Source: gunthercox/ChatterBot

@gunthercox, @vkosuri, @mymusise

See you guys concern the performance issue for statement get response, what I did for performance improvement may be helpful. The performance improves from 1.9s to 96.8ms by following changes to LevenshteinDistance:

move

import sys
from difflib import SequenceMatcher

to the front of the class.

comment out try ... exception ... block for library import.

        # import sys
        #
        # # Use python-Levenshtein if available
        # try:
        #     from Levenshtein.StringMatcher import StringMatcher as SequenceMatcher
        # except ImportError:
        #     from difflib import SequenceMatcher

        # PYTHON = sys.version_info[0]

        # Return 0 if either statement has a falsy text value
        # if not statement.text or not other_statement.text:
        #     return 0
        #
        # # Get the lowercase version of both strings
        # if PYTHON < 3:
        #     statement_text = unicode(statement.text.lower()) # NOQA
        #     other_statement_text = unicode(other_statement.text.lower()) # NOQA
        # else:
        #     statement_text = str(statement.text.lower())
        #     other_statement_text = str(other_statement.text.lower())

        statement_text = str(statement.text.lower())
        other_statement_text = str(other_statement.text.lower())

Good luck!

Source

zxsimple

Most helpful comment

Yeah Levenshtein.StringMatcher.StringMatcher and difflib.SequenceMatcher are both different libraries. Maybe faster but I think this try/except is because ChatterBot support both Python 2.7 and 3

pylobot on 12 Jan 2018

😄2

All 8 comments

Thanks for tips, could you please make a PR for this task?

vkosuri on 10 Jan 2018

@zxsimple Looked like avoiding the repetition of import improve the performance.

But I make a test seem like the performance wasn't improve obviously. Here's the code:

import time

t1 = time.time()
for i in range(200000):
    from difflib import SequenceMatcher
    similarity = SequenceMatcher(
            None,
            'statement text',
            'other statement_text'
        )
print("Use: ",time.time() - t1)

Output Use: 1.5159368515014648

then, make a change:

import time
from difflib import SequenceMatcher

t1 = time.time()
for i in range(200000):
    similarity = SequenceMatcher(
            None,
            'statement text',
            'other statement_text'
        )
print("Use: ",time.time() - t1)

Output Use: 1.2639625072479248

Is there any change in your code?

mymusise on 11 Jan 2018

@vkosuri I'll create PR after fully test

zxsimple on 11 Jan 2018

😕1 😄1

Yeah Levenshtein.StringMatcher.StringMatcher and difflib.SequenceMatcher are both different libraries. Maybe faster but I think this try/except is because ChatterBot support both Python 2.7 and 3

pylobot on 12 Jan 2018

😄2

@pylobot YES, I confirmed that I didn't install python Levenshtein, each invoke will go into the except block, that is pretty time consuming.

zxsimple on 16 Jan 2018

@vkosuri pull request has been created https://github.com/gunthercox/ChatterBot/pull/1158, please help review it.

zxsimple on 17 Jan 2018

@zxsimple sure

vkosuri on 17 Jan 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.