Models: Problem running Parsey with data in a file

Created on 20 May 2016 · 12Comments · Source: tensorflow/models

I want to be able to load a file of data and output to another file as described here: https://github.com/tensorflow/models/tree/master/syntaxnet#annotating-a-corpus. I have added the following to the very end of /models/syntaxnet/syntaxnet/models/parsey_mcparseface/context.pbtxt:

input { name: 'test_data' Part { file_pattern: 'TestData/0_2.txt' } } input { name: 'test_data_out' Part { file_pattern: 'TestDataOut/test_data_out.txt' } }
(I removed the record_format because my files are plain text, not any particular format)

Then, at the command line, I do the following:

syntaxnet/demo.sh --input=test_data --output=test_data_out

This does not produce the output file. Parsey seems to start up but then nothing happens (see below for the output in Terminal):

word input(-2).token.word input(-3).token.word input(-4).token.word I syntaxnet/embedding_feature_extractor.cc:36] Embedding names: other;prefix2;prefix3;suffix2;suffix3;words I syntaxnet/embedding_feature_extractor.cc:37] Embedding dims: 8;16;16;16;16;64 I syntaxnet/term_frequency_map.cc:101] Loaded 64036 terms from syntaxnet/models/parsey_mcparseface/word-map. INFO:tensorflow:Building training network with parameters: feature_sizes: [12 20 20] domain_sizes: [ 49 51 64038] I syntaxnet/term_frequency_map.cc:101] Loaded 64036 terms from syntaxnet/models/parsey_mcparseface/word-map. I syntaxnet/term_frequency_map.cc:101] Loaded 49 terms from syntaxnet/models/parsey_mcparseface/tag-map. INFO:tensorflow:Building training network with parameters: feature_sizes: [2 8 8 8 8 8] domain_sizes: [ 5 10665 10665 8970 8970 64038] I syntaxnet/term_frequency_map.cc:101] Loaded 46 terms from syntaxnet/models/parsey_mcparseface/label-map. I syntaxnet/embedding_feature_extractor.cc:35] Features: stack.child(1).label stack.child(1).sibling(-1).label stack.child(-1).label stack.child(-1).sibling(1).label stack.child(2).label stack.child(-2).label stack(1).child(1).label stack(1).child(1).sibling(-1).label stack(1).child(-1).label stack(1).child(-1).sibling(1).label stack(1).child(2).label stack(1).child(-2).label; input.token.tag input(1).token.tag input(2).token.tag input(3).token.tag stack.token.tag stack.child(1).token.tag stack.child(1).sibling(-1).token.tag stack.child(-1).token.tag stack.child(-1).sibling(1).token.tag stack.child(2).token.tag stack.child(-2).token.tag stack(1).token.tag stack(1).child(1).token.tag stack(1).child(1).sibling(-1).token.tag stack(1).child(-1).token.tag stack(1).child(-1).sibling(1).token.tag stack(1).child(2).token.tag stack(1).child(-2).token.tag stack(2).token.tag stack(3).token.tag; input.token.word input(1).token.word input(2).token.word input(3).token.word stack.token.word stack.child(1).token.word stack.child(1).sibling(-1).token.word stack.child(-1).token.word stack.child(-1).sibling(1).token.word stack.child(2).token.word stack.child(-2).token.word stack(1).token.word stack(1).child(1).token.word stack(1).child(1).sibling(-1).token.word stack(1).child(-1).token.word stack(1).child(-1).sibling(1).token.word stack(1).child(2).token.word stack(1).child(-2).token.word stack(2).token.word stack(3).token.word I syntaxnet/embedding_feature_extractor.cc:36] Embedding names: labels;tags;words I syntaxnet/embedding_feature_extractor.cc:37] Embedding dims: 32;32;64 I syntaxnet/term_frequency_map.cc:101] Loaded 49 terms from syntaxnet/models/parsey_mcparseface/tag-map. I syntaxnet/term_frequency_map.cc:101] Loaded 64036 terms from syntaxnet/models/parsey_mcparseface/word-map. I syntaxnet/term_frequency_map.cc:101] Loaded 49 terms from syntaxnet/models/parsey_mcparseface/tag-map. I syntaxnet/term_frequency_map.cc:101] Loaded 46 terms from syntaxnet/models/parsey_mcparseface/label-map. I syntaxnet/embedding_feature_extractor.cc:35] Features: input.digit input.hyphen; input.prefix(length="2") input(1).prefix(length="2") input(2).prefix(length="2") input(3).prefix(length="2") input(-1).prefix(length="2") input(-2).prefix(length="2") input(-3).prefix(length="2") input(-4).prefix(length="2"); input.prefix(length="3") input(1).prefix(length="3") input(2).prefix(length="3") input(3).prefix(length="3") input(-1).prefix(length="3") input(-2).prefix(length="3") input(-3).prefix(length="3") input(-4).prefix(length="3"); input.suffix(length="2") input(1).suffix(length="2") input(2).suffix(length="2") input(3).suffix(length="2") input(-1).suffix(length="2") input(-2).suffix(length="2") input(-3).suffix(length="2") input(-4).suffix(length="2"); input.suffix(length="3") input(1).suffix(length="3") input(2).suffix(length="3") input(3).suffix(length="3") input(-1).suffix(length="3") input(-2).suffix(length="3") input(-3).suffix(length="3") input(-4).suffix(length="3"); input.token.word input(1).token.word input(2).token.word input(3).token.word input(-1).token.word input(-2).token.word input(-3).token.word input(-4).token.word I syntaxnet/embedding_feature_extractor.cc:36] Embedding names: other;prefix2;prefix3;suffix2;suffix3;words I syntaxnet/embedding_feature_extractor.cc:37] Embedding dims: 8;16;16;16;16;64 I syntaxnet/term_frequency_map.cc:101] Loaded 64036 terms from syntaxnet/models/parsey_mcparseface/word-map.

And then nothing ...

Source

shaxtell

Most helpful comment

@kskp btw, congrats on getting it to work!

shaxtell on 7 Jun 2016

😄1 👍1

All 12 comments

Hi Shane,

You'll need to modify or branch demo.sh. It is the parser_eval binary that takes --input and --output arguments.

A good place to see how this is done is the section on tagging a corpus with a trained tagger: https://github.com/tensorflow/models/tree/master/syntaxnet#preprocessing-with-the-tagger

Cheers,
Daniel

andorardo on 23 May 2016

@andorardo,

Thank you. Got this part figured out. Now I am seeing issues with the actual parses. Will open a new ticket for that.

shaxtell on 23 May 2016

👍1

@shaxtell,
What do your demo.sh and context.pbtxt look like and what command do you run to call syntaxtnet?
I'm trying to syntaxnet/demo.sh and getting the below errors:

F ./syntaxnet/proto_io.h:147] Check failed: input.record_format_size() == 1 (0 vs. 1)TextReader only supports inputs with one record format: name: "test_data_out"

F ./syntaxnet/proto_io.h:147] Check failed: input.record_format_size() == 1 (0 vs. 1)TextReader only supports inputs with one record format: name: "test_data"

uscskelly on 24 May 2016

👍1

@uscskelly, you can see my demo.sh and context.pbtxt contents in #144. Cheers.

shaxtell on 26 May 2016

@uscskelly, @shaxtell I get the same issue. Were you able to resolve it? My demo.sh file contents are:

**PARSER_EVAL==/home/melvyn/.cache/bazel/_bazel_melvyn/74dc22424f44a6abb8ec23cb05763d3b/syntaxnet/bazel-out/local-opt/bin/syntaxnet/parser_eval
MODEL_DIR=syntaxnet/models/parsey_mcparseface
[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin

$PARSER_EVAL \
--input=input_file \
--output=output_file \
--hidden_layer_sizes=64 \
--arg_prefix=brain_tagger \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/tagger-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr \
| \
$PARSER_EVAL \
--input=output_file \
--output=parsed_file \
--hidden_layer_sizes=512,512 \
--arg_prefix=brain_parser \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/parser-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr \
| \
/home/melvyn/.cache/bazel/_bazel_melvyn/74dc22424f44a6abb8ec23cb05763d3b/syntaxnet/bazel-out/local-opt/bin/syntaxnet/conll2tree \
--task_context=$MODEL_DIR/context.pbtxt \
--alsologtostderr**

I appended the following in the end of my context.pbtxt file;

input {
name: 'input_file'
record_format: 'conll-sentence'
Part {
file_pattern: '/home/melvyn/text.txt'
}
}
input {
name: 'output_file'
record_format: 'conll-sentence'
Part {
file_pattern: '/home/melvyn/text-tagged.txt'
}
}
input {
name: 'parsed_file'
record_format: 'conll-sentence'
Part {
file_pattern: '/home/melvyn/text-parsed.txt'
}
}

kskp on 7 Jun 2016

@kskp: it looks to me like your input_file needs the 'english-text' record_format instead of 'conll-sentence' if text.txt is just a plain text file with one sentence per line. Try that and let me know.

shaxtell on 7 Jun 2016

@shaxtell This is what I get:

./demo.sh: line 31: =/home/melvyn/.cache/bazel/_bazel_melvyn/74dc22424f44a6abb8ec23cb05763d3b/syntaxnet/bazel-out/local-opt/bin/syntaxnet/parser_eval: No such file or directory
./demo.sh: line 43: =/home/melvyn/.cache/bazel/_bazel_melvyn/74dc22424f44a6abb8ec23cb05763d3b/syntaxnet/bazel-out/local-opt/bin/syntaxnet/parser_eval: No such file or directory
W external/tf/tensorflow/core/framework/op_kernel.cc:899] Not found: syntaxnet/models/parsey_mcparseface/context.pbtxt
F ./syntaxnet/proto_io.h:147] Check failed: input.record_format_size() == 1 (0 vs. 1)TextReader only supports inputs with one record format: name: "stdin-conll"

I dont know why it says the parser_eval not found. when I do sudo find / -iname 'parser_eval', i get the above mentioned path.

kskp on 7 Jun 2016

@kskp here are my appended context.pbtxt inputs:

input {
name: 'test_data'
record_format: 'english-text'
Part {
file_pattern: 'TestData/Mixed100.txt'
}
}
input {
name: 'test_data_out'
record_format: 'conll-sentence'
Part {
file_pattern: 'TestDataOut/test_data_out_mixed.txt'
}
}
input {
name: 'test_data_parsed'
record_format: 'conll-sentence'
Part {
file_pattern: 'TestDataOut/test_data_parsed_mixed.txt'
}
}

And here is my demo.sh file:

PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
MODEL_DIR=syntaxnet/models/parsey_mcparseface
[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin

$PARSER_EVAL \
$PARSER_EVAL \
--input=test_data \
--output=test_data_out \
--hidden_layer_sizes=64 \
--arg_prefix=brain_tagger \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/tagger-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr \
| \
$PARSER_EVAL \
--input=test_data_out \
--output=test_data_parsed \
--hidden_layer_sizes=512,512 \
--arg_prefix=brain_parser \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/parser-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr \
| \
bazel-bin/syntaxnet/conll2tree \
--task_context=$MODEL_DIR/context.pbtxt \
--alsologtostderr

To my eye, your files look correct. I do think I did run into a "stdin-conll" error at one point, but I don't recall under which circumstances. If you can't get this resolved, my next best suggestion would be to open another ticket and see if you can get the developers' attention. Sorry I couldn't be more help.

shaxtell on 7 Jun 2016

@shaxtell I figured it out. I was issuing the command wrong. I was doing:
./demo.sh --input=input_file --output=output_file from models/syntaxnet/syntaxnet
whereas I was supposed to do:
syntaxnet/demo.sh --input=input_file --output=output_file from models/syntaxnet.

Whoaa.. I dint know this makes a difference. I am new to linux. Thanks for your time. Really appreciate.

kskp on 7 Jun 2016

@kskp you shouldn't need the --input and --output flags from the command line since you've already specified them in the demo.sh file. Try just issuing the ./demo.sh from the right place on the command line and see if that works. My guess is that since you've specified the right stuff in context.pbtxt and demo.sh, syntaxnet is just ignoring your flags.

shaxtell on 7 Jun 2016

@kskp btw, congrats on getting it to work!

shaxtell on 7 Jun 2016

😄1 👍1

I used syntaxnet to train a segmentation on chinese corpus, an error occurred:

"2017-04-21 14:01:47.441291: F syntaxnet/task_context.cc:140] Check failed: input.part_size() == 1 (0 vs. 1)lcword-map"

Can somebody tell me why?