Incubator-mxnet: im2rec documentation is lacking / buggy

Created on 25 Jul 2018  路  4Comments  路  Source: apache/incubator-mxnet

im2rec.py doesn't have docstrings. The online doc that describes the RecordIO format (here) doesn't describe how to use the script. Another page does, but it could really use a full-fledged real-world example, and should either be unified with or at least linked to from the other doc page.

In addition, the im2rec.py script encodes class labels as floating point numbers:

def write_list(path_out, image_list):
    with open(path_out, 'w') as fout:
        for i, item in enumerate(image_list):
            line = '%d\t' % item[0]
            for j in item[2:]:
                line += '%f\t' % j ### <---- THIS IS THE FLOATING POINT LABEL
            line += '%s\n' % item[1]
            fout.write(line)

But the doc page has integer labels in the example. When trying to make my own test .lst file by hand I used integer labels which in fact didn't work with mx.io.ImageRecordIter. It'd be great to get a clarification on the exact required .lst file format.

Data-loading Doc Unclear ErroDoc

Most helpful comment

Fix merged - thanks @stu1130 - this can be resolved.
@kjchalup - kindly resolve, or add more details if there is still an issue. Thanks!

All 4 comments

Thanks for submitting the issue @kjchalup
The reason why it uses floating point is that the label value could be generated by the regression, e.g. 68.6 kg for a human body weight.
There is a page that uses im2rec with real data.
I did some experiments to change the code from %f to %d and it works with ImageRecordIter.
Could you provide some reproducible code for me?
I would update the documentation as well.

After I traced the code from im2rec.py. The correct format is float. But if you use integer for label data, it should work fine since im2rec.py uses float to get the label value back and float(label value) return a float value anyway

Thank you for the response!

1) The page you linked has a nice example thanks! It'd be great if it was discoverable from this page.

2) im2rec.py still has no docstrings, nor even a line of comments besides the license boilerplate :( It's really not trivial for someone who has no idea how im2rec works to figure out what is going on in there quickly.

3) There's a slight misunderstanding. As I said in the original post, I tried to make my own .lst file by hand, following the example on the doc page which at the time had integer labels. I wasn't using im2rec.py, as I gave up on understanding it, lacking an example and any docstrings/comments. I was just using Vim to write stuff :) This didn't work with mx.io.ImageRecordIter, because the labels must in fact be floats, not ints. However, the docs are now updated -- the labels are floats. So the problem is fixed :) Thanks!

Fix merged - thanks @stu1130 - this can be resolved.
@kjchalup - kindly resolve, or add more details if there is still an issue. Thanks!

Was this page helpful?
0 / 5 - 0 ratings