dlib face detection in FDDB result, and suggest dlib get more accurate face detector.

Created on 13 Jul 2018 · 8Comments · Source: davisking/dlib

From this question from stackoverflow which compares dlib's face detect with MTCNN . The MTCNN's method outperform dlib both in accuracy and performance.

Is dlib's face detect method a little out of time ?

So I suggest dlib to get more accurate face detect method.

Source

Jayhello

Most helpful comment

Yes, I know about their rectangle model. Reread what I said earlier. FDDB uses a different annotation style than dlib. Most importantly, the boxes are centered on the faces differently, even in the "rectangle" version of FDDB. This is why I keep telling you to look at the false alarms you are getting from FDDB. If you would just do that you would see immediately what the issue is, improper conversion to FDDB's annotation format.

I'm short with you because you have a history of posting things where you clearly haven't put effort into reading the documentation and you then expect me to dig through your code to find out what is the problem. You have to be respectful of other people's time. A central part of that is not asking questions to people that are easily solved by looking at the documentation or a little bit of debugging. If after trying to solve a problem yourself you can't figure it out then asking about it is fine. But it's the people who habitually ask me to solve their problems and want to treat dlib forums like a code writing or code debugging service that don't get helpful answers.

As for how I converted dlib's boxes to FDDB, I looked at my code history and found this file, which is probably what I did. However I don't recall. But it is a place to start:

#include <iostream>
#include <dlib/dir_nav.h>
#include <dlib/geometry.h>
#include <dlib/string.h>
#include <dlib/image_transforms.h>

using namespace std;
using namespace dlib;

std::vector<std::string> get_image_list (
    const std::string& filename
)
{
    std::vector<std::string> res;
    ifstream fin(filename);
    string line;
    while(getline(fin,line))
        res.push_back(line);
    return res;
}

std::map<string, std::vector<mmod_rect>> get_dlib_dets(int argc, char** argv) 
{
    // The reason for this whole complex map business is because the FDDB evaluation tools
    // require us to output the detection image lists in the same order they appear in the
    // FDDB fold files.
    std::map<string, std::vector<mmod_rect>> res;

    std::vector<std::vector<mmod_rect> > dets;
    std::vector<string> files;

    for (int i = 2; i < argc; ++i)
    {
        deserialize(argv[i]) >> files >> dets;

        for (auto&& dd : dets)
        {
            for (auto&& d : dd)
            {
                d.rect = translate_rect(d.rect, point(0, -(d.rect.width()/6.0+0.5)));
                const double scale = 1.10;
                d.rect = centered_rect(d.rect, d.rect.width()*scale+0.5, d.rect.height()*scale+0.5);
            }
        }

        for (unsigned long i = 0; i < files.size(); ++i)
        {
            string name = left_substr(files[i].substr(files[i].find("200")),".");
            res[name] = dets[i];
        }
    }
    return res;
}

int main(int argc, char** argv)
{

    auto dets = get_dlib_dets(argc, argv);

    for (auto&& f : get_image_list(argv[1]))
    {
        DLIB_CASSERT(dets.count(f) != 0,"");
        cout << f << "\n";
        cout << dets[f].size() << "\n";
        for (auto&& d : dets[f])
        {
            cout << d.rect.left() << " " << d.rect.top() << " " << d.rect.width() << " " << d.rect.height() << " " << d.detection_confidence << "\n";
        }
    }
}

davisking on 14 Jul 2018

👍11 🎉1

All 8 comments

I'm not going to implement MTCNN. Stop posting about it. If you want to see MTCNN in dlib then submit it as a PR.

I also want to point out that the ROC curves you posted on stackoverflow are obviously bogus. You can see a legitimate FDDB ROC curve of dlib's HOG detector here https://arxiv.org/pdf/1502.00046.pdf and FDDB evaluation of dlib's CNN detector here: http://blog.dlib.net/2016/10/easily-create-high-quality-object.html. Both of them are clearly wildly better than what you posted on stackoverflow. You obviously aren't running the FDDB evaluation correctly.

Moreover, look at the results discussed in http://blog.dlib.net/2016/10/easily-create-high-quality-object.html. You can see that FDDB is saturated. The CNN detector in dlib has essentially no false alarms on FDDB while achieving about 0.88 recall. All the really good state-of-the-art detectors have saturated at essentially that level on FDDB. Moreover, the curves in the MTCNN paper plot their FDDB ROC curves with the x axis extended way out to 2000 false alarms. That's crazy and makes it hard to tell if it's actually performing well at a meaningful false alarm rate. You have to zoom in on the curve they show to see that part of their plot and it's blurry and hard to read. But it doesn't even look like they got to 0.88 recall before the false alarm rate goes high. So I'm not super convinced MTCNN is even getting to state-of-the-art accuracy on FDDB. And don't tell me about how their area under the ROC curve is high. Anyone can make detectors tuned for super high recall when you allow for high false alarm rates.
So it's just not a useful metric. What matters is accuracy at a reasonable false alarm rate.

davisking on 13 Jul 2018

I don't think so, the method i detect face using this method(face_recognition py code):

def detect_face_lst(img):
    """
    :param img: opencv image
    :return: face rectangles [[x, y, w, h], ..........]
    """
    face_locations = face_recognition.face_locations(img, 1, 'cnn')
    lst = [CvRect(item[0], item[1], item[2], item[3]) for item in face_locations]

    lst_ret = []
    for item in lst:
        lst_ret.append([item.left, item.top, item.get_width(), item.get_height()])

    return lst_ret

I won't want to saw anything again.

Jayhello on 13 Jul 2018

You are just running the FDDB eval wrong. The curves you posted are horrible. If the face_recognition library, which is just a thin wrapper around dlib, produced face detections that were as awful as what your plots show no one would use it.

davisking on 13 Jul 2018

Here is my C++ code:

#include <iostream>
#include <dlib/dnn.h>
#include <dlib/data_io.h>
#include <dlib/image_processing.h>
#include <dlib/gui_widgets.h>
using namespace std;
using namespace dlib;

// ----------------------------------------------------------------------------------------

template <long num_filters, typename SUBNET> using con5d = con<num_filters,5,5,2,2,SUBNET>;
template <long num_filters, typename SUBNET> using con5  = con<num_filters,5,5,1,1,SUBNET>;

template <typename SUBNET> using downsampler  = relu<affine<con5d<32, relu<affine<con5d<32, relu<affine<con5d<16,SUBNET>>>>>>>>>;
template <typename SUBNET> using rcon5  = relu<affine<con5<45,SUBNET>>>;

using net_type = loss_mmod<con<1,9,9,1,1,rcon5<rcon5<rcon5<downsampler<input_rgb_image_pyramid<pyramid_down<6>>>>>>>>;

// ----------------------------------------------------------------------------------------

void getAllImgPaths(const std::string& file, std::vector<std::string>& vecPaths){

    std::fstream fStream(file);
    std::string sLine;

    while (std::getline(fStream, sLine)){
        if (sLine.size() > 0){
            vecPaths.emplace_back(sLine);
        }
    }

    fStream.close();
}

void writeStrVecToFile(const std::string& file, const std::vector<std::string>& vecStr){
    std::ofstream fout(file);
    for (auto const& x:vecStr){
        fout<<x<<'\n';
    }

    fout.close();
}



int main(){

    std::string fPath = "/home/xy/face_sample/evaluation/compareROC/FDDB-folds/filePath.txt";
    std::vector<std::string> vecImgPaths;

    getAllImgPaths(fPath, vecImgPaths);

    std::string imgBaseDir = "/home/xy/face_sample/evaluation/compareROC/originalPics/";
    std::vector<std::string> vecDetRet;

    string model_path = "/home/xy/anaconda2/lib/python2.7/site-packages/face_recognition_models/models/mmod_human_face_detector.dat";
    net_type net;
    deserialize(model_path) >> net;

    for (auto const& img_name:vecImgPaths){
        std::string imgFullPath = imgBaseDir + img_name + ".jpg";

        matrix<rgb_pixel> img;
        load_image(img, imgFullPath);

        auto dets = net(img);
        vecDetRet.push_back(img_name);
        vecDetRet.push_back(std::to_string(dets.size()));

        for (auto det:dets){

            using std::to_string;

            // sFaceInfo like 49 55 193 193 0.999784
            std::string sFaceInfo = to_string(det.rect.left()) + " " + to_string(det.rect.top()) + " " +
                                    to_string(det.rect.width()) + " " + to_string(det.rect.height()) + " " + to_string(1);

            std::cout<<sFaceInfo<<std::endl;
            vecDetRet.push_back(sFaceInfo);

        }

    }

    // write face detect result to txt file for fddb compare
    std::string fddbTxtPath = "fddb_ret.txt";
    writeStrVecToFile(fddbTxtPath, vecDetRet);
}

running result:

C++ ROC more bad than python face_recognition, here is result(I don't konw is there anything wrong):

Using py face_recognition lib:

Jayhello on 13 Jul 2018

👎2

I don't have anything malice. it's just for the API you give and face_recognition give.

Jayhello on 13 Jul 2018

You aren't listening to me or reading FDDB's instructions. The problem has nothing to do with how you are calling dlib. Look at the output from the detector. Find the places where you think it's false alarming. You will see that it isn't false alarming on those detections, it's just that you haven't properly converted the detections into FDDBs detection format. In particular, FDDB has a certain annotation style it expects you to output, these ellipses. But the dlib detector outputs square boxes. You need to do some sensible conversion so that the boxes are centered and shaped in the way FDDB expects or you won't get a sensible result from the FDDB evaluation software.

I seem irritated because you obviously have put no effort into trying to debug this. If you simply looked at the outputs with your eyes you would immediately see that there are not hundreds of false alarms and that something with how you are running the FDDB evaluation software is incorrect.

davisking on 13 Jul 2018

👍1

I have read it's paper, fddb have 'rectangle' model. There is no need to translate rectangle to ellipses.

For the opencv, dlib, mtcnn, I use the same code.Why other is not fault?

I don't want to speak with you any more, when I saw you program has problem, you are always domineering.

Like this question https://github.com/ageitgey/face_recognition/issues/494 , I have issue it in dlib too, you just see i was wrong. But is not.

If you my test about face detect on dlib not accurate, you can give the test code in dlib on fddb, which will be very easy for you(do not just give the paper result) . Not just say I am wrong.

Jayhello on 14 Jul 2018

As for how I converted dlib's boxes to FDDB, I looked at my code history and found this file, which is probably what I did. However I don't recall. But it is a place to start:

#include <iostream>
#include <dlib/dir_nav.h>
#include <dlib/geometry.h>
#include <dlib/string.h>
#include <dlib/image_transforms.h>

using namespace std;
using namespace dlib;

std::vector<std::string> get_image_list (
    const std::string& filename
)
{
    std::vector<std::string> res;
    ifstream fin(filename);
    string line;
    while(getline(fin,line))
        res.push_back(line);
    return res;
}

std::map<string, std::vector<mmod_rect>> get_dlib_dets(int argc, char** argv) 
{
    // The reason for this whole complex map business is because the FDDB evaluation tools
    // require us to output the detection image lists in the same order they appear in the
    // FDDB fold files.
    std::map<string, std::vector<mmod_rect>> res;

    std::vector<std::vector<mmod_rect> > dets;
    std::vector<string> files;

    for (int i = 2; i < argc; ++i)
    {
        deserialize(argv[i]) >> files >> dets;

        for (auto&& dd : dets)
        {
            for (auto&& d : dd)
            {
                d.rect = translate_rect(d.rect, point(0, -(d.rect.width()/6.0+0.5)));
                const double scale = 1.10;
                d.rect = centered_rect(d.rect, d.rect.width()*scale+0.5, d.rect.height()*scale+0.5);
            }
        }

        for (unsigned long i = 0; i < files.size(); ++i)
        {
            string name = left_substr(files[i].substr(files[i].find("200")),".");
            res[name] = dets[i];
        }
    }
    return res;
}

int main(int argc, char** argv)
{

    auto dets = get_dlib_dets(argc, argv);

    for (auto&& f : get_image_list(argv[1]))
    {
        DLIB_CASSERT(dets.count(f) != 0,"");
        cout << f << "\n";
        cout << dets[f].size() << "\n";
        for (auto&& d : dets[f])
        {
            cout << d.rect.left() << " " << d.rect.top() << " " << d.rect.width() << " " << d.rect.height() << " " << d.detection_confidence << "\n";
        }
    }
}

davisking on 14 Jul 2018

👍11 🎉1

Was this page helpful?

0 / 5 - 0 ratings