Bert: I encountered key error by using my own data set

Created on 4 Jan 2019 · 14Comments · Source: google-research/bert

We should not do the things described below, otherwise it shall yield very wierd result, as only few data are passing into processing.,

And we shall take a look at data processing, ensure that text_a and label are correctly passed into

and do this at create examples

def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
        if i == 0:
            continue
        guid = "%s-%s" % (set_type, i)
        label = tokenization.convert_to_unicode(line[0])
        text_a = tokenization.convert_to_unicode(line[1])
        # text_b = tokenization.convert_to_unicode(line[2])
        examples.append(
            InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
    random.shuffle(examples)
    return examples

~~~~~~~~~~~~~~~~

I have encountered some key errors, similar to previous issue I suppose, but I did the same thing accordingly, I didn't manage to solve it.

The following are the things shown on screen

/Paul $ python run_classifier.py --task_name=bosco --do_train=true --do_eval=true --dopredict=true --data_dir=$MY_DATASET --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --max_seq_length=128 --train_batch_size=32 --learning_rate=5e-5 --num_train_epochs=50.0 --output_dir=.data/output
WARNING:tensorflow:Estimator's model_fn (.model_fn at 0x7fb390f0ed90>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': '.data/bosco_output', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': , '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:Writing example 0 of 29206
Traceback (most recent call last):
File "run_classifier.py", line 1010, in
tf.app.run()
File "/home/yuwei/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "run_classifier.py", line 899, in main
train_examples, label_list, FLAGS.max_seq_length, tokenizer, train_file)
File "run_classifier.py", line 518, in file_based_convert_examples_to_features
max_seq_length, tokenizer)
File "run_classifier.py", line 487, in convert_single_example
label_id = label_map[example.label]
KeyError: 'Quality'

Source

PaulZhangIsing

Most helpful comment

Seems there is some bug on line 489 of run_classifier.py.
I added a tab before features and everything is fine

PaulZhangIsing on 9 Jan 2019

❤1 🎉1 👍1

All 14 comments

Seems it is similar to issue #80

PaulZhangIsing on 8 Jan 2019

Seems there is some bug on line 489 of run_classifier.py.
I added a tab before features and everything is fine

PaulZhangIsing on 9 Jan 2019

❤1 🎉1 👍1

how did you solve this problem? I am not sure which line is line 489 @PaulZhangIsing

mice4869 on 10 Jan 2019

how did you solve this problem? I am not sure which line is line 489 @PaulZhangIsing

about here, I make feature line into the if loop
in the file, run_classifier.py , line 489

for (ex_index, example) in enumerate(examples):
if ex_index % 10000 == 0:
tf.logging.info("Writing example %d of %d" % (ex_index, len(examples)))

feature = convert_single_example(ex_index, example, label_list,

max_seq_length, tokenizer)

PaulZhangIsing on 10 Jan 2019

Im sorry but would you mind share you code of run_classfier.py to me? I have been stucked in this problem for several days. And if you dont mind, I`d like to see your data set too. I have some worries of my data set. @PaulZhangIsing . My email is [email protected]

mice4869 on 10 Jan 2019

Im sorry but would you mind share you code of run_classfier.py to me? I have been stucked in this problem for several days. And if you dont mind, I`d like to see your data set too. I have some worries of my data set. @PaulZhangIsing . My email is [email protected]

Sorry I unable to do so. But you can send yours to me and I try to edit on it and send it back to u?

PaulZhangIsing on 10 Jan 2019

send my dataset and code to you,please check your email. @PaulZhangIsing .

mice4869 on 10 Jan 2019

I need this weekend to check as my company currently unable to connect to outlook

Sent from my iPhone

On 10 Jan 2019, at 16:55, mice4869 <[email protected]notifications@github.com> wrote:

send my dataset and code to you,please check your email. @PaulZhangIsinghttps://github.com/PaulZhangIsing .

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/333#issuecomment-453018188, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Aab_R5Z8eLNWYFSyW5U0Glc9dW6coVYlks5vBwAEgaJpZM4ZpNAN.

PaulZhangIsing on 11 Jan 2019

it`s ok. I got this problem fixed. Thank you man@PaulZhanglsing

mice4869 on 11 Jan 2019

So basically what have u done?

Sent from my iPhone

On 11 Jan 2019, at 14:46, mice4869 <[email protected]notifications@github.com> wrote:

it`s ok. I got this problem fixed. Thank you man@PaulZhanglsing

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/333#issuecomment-453396366, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Aab_R9YgkVPb8jCZN96E3GjlYjKLRdh3ks5vCDM6gaJpZM4ZpNAN.

PaulZhangIsing on 11 Jan 2019

Hi, I'm having the same problem. I added a tab in line 489 but I am still getting the same error. How did you solved it?

Danysolism on 5 Feb 2019

Hi, I'm having the same problem. I added a tab in line 489 but I am still getting the same error. How did you solved it?

You should take a look at data processor instead.

PaulZhangIsing on 21 Feb 2019

I manged to solved it by adding:
[CLS]
[SEP]
[UNK]
[MASK]
To my vocab file.

AsafBanana on 21 Jul 2019

@AsafBanana I have done this for [CLS] and [SEP] but I still get: KeyError: '[CLS]'

rjurney on 11 Oct 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

IndexError in run_classifier.py::MrpcProcessor::_create_examples (2)

alter-bug-tracer · 3Comments

Model become 3 times larger after finetune?

wangwei7175878 · 4Comments

restore parameters from previous checkpoints

quincyliang · 4Comments

Expected masked_lm_accuracy

okgrammer · 4Comments

Question: What does "pooler layer" mean? Why it called pooler?

miyamonz · 3Comments