One: [record-minmax] Accept raw data as a representative dataset

Created on 2 Jul 2021  路  8Comments  路  Source: Samsung/ONE

What

Let's enable record-minmax to accept raw data as a representative dataset.

Why

For someone who is not familiar with hdf5.

CC @lemmaa

arequant

All 8 comments

record-minmax --input_data <input_data>

If is an .h5 file (~~.h5), record-minmax will run as before.

Else, is assumed to be a text file which specifies the representative data. The text file should contain absolute file path per line. The pointed file should be a binary file containing one representative input, ready to be consumed by the input circle model without any modification (i.e., pre-processed raw data).

Future work may be to support some pre-precessing functionalities.

Else, is assumed to be a text file which specifies the representative data.

This makes codes complicated. I don't think this is a good approach. It would be better to introduce new option.

I'm afraid adding a new option may confuse users. For example, if there are two options (e.g., --input_data (we should keep this option for backward compatibility), --raw_input_list), users can misuse the option (using with a wrong file type or using both options, etc).

using with a wrong file type

All the options have to solve this problem. Even you go with your option: h5 or not.

using both options

We also have this cases. I think this is natural. We have this situation on _one-import_tf_ where input model be one of saved model, keras or just graph_def.

1) to support by option

  • --input_data or --raw_input_list or ...
  • --input_data for input file and addition option --data-type (default is h5 and there are others, list for text format list)
  • we only check for the type, if failed it is a failure. codes and testing will be simple.

2) automatic type detection

  • we must first try with existing supported type. how? using file extension can be dangerous. I would give an jpeg file with h5 extensiton and see the program exit with segment fault...
  • we may try with error handling. h5 throws. this is ok. for text of list.. well...
  • when we try to support other types... it happens to be a text with a list but format is different...
  • code itself should handle all the cases with if, elseif, elseif,....

anyway, there can be lots of bad cases that we can think of and lots of good things we only wany to think of..
there can be many complicated and error prone for also (1) but here we want to make it simple for both use interface and code.

@seanshpark I understand your concern.

Then, there are two options.

  1. --input_data, --raw_input_list, ..
  2. --input_data, --input_data_format (input_data_format can be h5/hdf5 (default) and list/filelist)

I'd like to go with the second option because it is simpler.

TODO

  • [x] Support raw data profiling in record-minmax #7160
  • [x] Update record-minmax interface #7170
  • [x] Update one-build, one-quantize interface #7174
  • [x] Add tests (one-cmds) #7174

All done

Was this page helpful?
0 / 5 - 0 ratings