I have a package foo that looks like this:
.
โโโ data
โย ย โโโ a.proto
โย ย โโโ b.proto
โโโ generated
ย ย โโโ a_pb2.py
ย ย โโโ b_pb2.py
โโโ __init__.py
# a.proto
package foo;
# b.proto
import "a.proto";
package foo;
Generate the code: protoc -I ./data --python_out=generated data/a.proto data/b.proto.
Here is the failure:
Python 3.5.1 (default, Mar 3 2016, 09:29:07)
[GCC 5.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from generated import b_pb2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/corentih/repro/generated/b_pb2.py", line 16, in <module>
import a_pb2
ImportError: No module named 'a_pb2'
This is beacuse the generated code looks like this:
import a_pb2
If the import was relative it would actually work:
from . import a_pb2
I have the exactly same problem, hope to be fixed
@goldenbull I submitted a fix, let's see if it makes it through. I'm just not sure: are there cases where we _don't_ want relative imports?
@little-dude how about if a_pb2.py is generated into a different folder as b_pb2.py?
Could you provide a small example of what you're thinking about, so that I try it with my change?
.
โโโ proto
โ โโโ a.proto
โ โโโ b.proto
โโโ pkg_a
โ โโโ a_pb2.py
โ โโโ __init__.py
โโโ pkg_b
โโโ b_pb2.py
โโโ __init__.py
maybe this is not a good case, I don't have enough knowledge about how protobuf/python/etc. deal with the importing
I don't think this is actually possible because the generated modules follow the hierarchy of the proto files.
However we could imagine that we have the following:
.
โโโ data
ย ย โโโ a.proto
ย ย โโโ b.proto
ย ย โโโ sub
ย ย โโโ c.proto
ย ย โโโ sub
ย ย โโโ d.proto
with the following:
# a.proto
package foo;
import "b.proto";
import "sub/c.proto";
import "sub/sub/d.proto";
# b.proto
package foo;
import "sub/c.proto";
import "sub/sub/d.proto";
# sub/c.proto
package foo;
import "sub/d.proto";
# sub/sub/d.proto
package foo;
We generate the code with:
protoc -I data -I data/sub -I data/sub/sub --python_out=generated data/a.proto data/b.proto data/sub/c.proto data/sub/sub/d.proto
which generated the following:
.
โโโ generated
ย ย โโโ a_pb2.py
ย ย โโโ b_pb2.py
ย ย โโโ sub
ย ย โโโ c_pb2.py
ย ย โโโ sub
ย ย โโโ d_pb2.py
But this is a more complex case than what I am trying to fix.
Edit: I'm not even sure this is a valid case but here is the error I'm getting with the master branch (4c6259bbe8417a25856f634f300a85949dbce4f1):
In [1]: from generated import a_pb2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-f28bccc761b6> in <module>()
----> 1 from generated import a_pb2
/home/corentih/repro/generated/a_pb2.py in <module>()
14
15
---> 16 import b_pb2 as b__pb2
17 from sub import c_pb2 as sub_dot_c__pb2
18 from sub.sub import d_pb2 as sub_dot_sub_dot_d__pb2
/home/corentih/repro/generated/b_pb2.py in <module>()
14
15
---> 16 from sub import c_pb2 as sub_dot_c__pb2
17 from sub.sub import d_pb2 as sub_dot_sub_dot_d__pb2
18
/home/corentih/repro/generated/sub/c_pb2.py in <module>()
14
15
---> 16 from sub import d_pb2 as sub_dot_d__pb2
17
18
/home/corentih/repro/generated/sub/sub/d_pb2.py in <module>()
20 package='foo',
21 syntax='proto2',
---> 22 serialized_pb=_b('\n\x0fsub/sub/d.proto\x12\x03\x66oo')
23 )
24 _sym_db.RegisterFileDescriptor(DESCRIPTOR)
TypeError: __init__() got an unexpected keyword argument 'syntax'
@haberman is there any chance for this to be fixed before next release? It's quite limiting for python 3.
I have exactly the same problem (using protobuf v3beta3), the generated imports do not conform to PEP 328 (finalized in 2004), restated in the Python docs: https://docs.python.org/3/tutorial/modules.html#intra-package-references - the 12 yr-old specification is enforced in Python3, so generated protobufs are unusable without further modification.
any updates on this?
Yikes, sorry for the slow reply on this.
Wouldn't relative imports break the case that you are importing protos from a different pip package?
For example, the well-known types come from the google-protobuf pip package. If we merge a change to use relative imports, imports of google/protobuf/timestamp.proto (for example) would be broken.
@haberman This bug has to do with protos importing protos in the same package, even in the same directory. The compiler converts this into relative imports with defective syntax under Python 3, so the generated code cannot execute at all. I don't see how you can get away without using relative imports in this case. I've had to manually edit the compiler generated pb2.py files to get them to work at all.
+1 for fixing this bug. It's stopping me migrating from python2 to python3.
+1 for fixing as well
+1, as far as I can tell this completely prevents proto imports in Python 3. Seems extremely worrying that this isn't fixed.
EDIT: this is not quite right, see my comment below.
+1 for fixing
I believe protobuf is working as intended in this case. The python package generated for a .proto file mirrors exactly the relative path of the .proto file itself. For example, if you have a .proto file "proto/a.proto", the generated python code must be "proto/a_pb2.py" and it must be in the "proto" package. In @little-dude 's example, if you want the generated code in the "generated" package, the .proto files themselves must be put in the "generated" directory:
โโโ generated
โโโ a.proto
โโโ b.proto
with protoc invoked as:
$ protoc --python_out=. generated/a.proto generated/b.proto
This way, the output will have the correct import statements (it will be "import generated.a_pb2" rather than "import a_pb2").
Using relative imports only solves the problem when all generated py code is put in the same directory. That's not the case when you import protos from other projects though (e.g., use protobuf's well-known types). It will likely break more than it fixes.
I am confused by the claims that this is totally broken in Python 3. We have Python 3 tests (and have for a while) that are passing AFAIK. Why would Python 3 require relative imports?
The issue that I'm having and that I believe others are having is that the proto code import "foo.proto" compiles into the Python3 code import foo_pb2. However, implicit relative imports were disabled in Python3, so relative imports must be of the form from . import foo_pb2. Manually changing the generated proto code to this form after proto compilation fixes the issue.
There are already multiple existing issues concerning this problem, and it first seems to have been recognised in 2014 (!!!): #90, #762, #881, #957
I read a bit more about Python 3's import rules and I think I can give a better explanation.
In Python 3 the syntax import foo imports from the interpreter's current working directory, from $PYTHONPATH, or from an installation-dependent default. So if you compile proto/foo.proto to gen/foo_pb2.py, the syntax import foo_pb2 works only if the current working directory is gen/ or if you placed gen/ on your python path.
If you are compiling protos as part of a Python package (which is the case in most non-trivial Python projects), the interpreter's current working directory is the directory of your main module (suppose the directory is mypackage/), and modules in the package must either use fully-qualified absolute imports (e.g. import mypackage.gen.foo_pb2) or relative imports (e.g. from .gen import foo_pb2).
In Python 2, a module inside gen/ could do import foo_pb2 and this would import mypackage.gen.foo_pb2 into its namespace, regardless of the current working directory. This is an implicit relative import.
In Python 3, implicit relative imports don't exist and import foo_pb2 will not find foo_pb2.py, even if the module importing foo_pb2 is inside gen/. This is the issue that people are complaining about in the thread.
The root of this problem seems to be that import "foo.proto"; needs to compile into from <absolute or relative package path> import foo_pb2 when the proto is inside a package, and import foo_pb2 otherwise. Neither syntax will work in both scenarios. The proto compiler ignores the package name in the proto file and only observes the directory structure of the proto files, so if you want the from <path> import foo_pb2 output you need to place your protos in a directory structure mirroring the Python structure. For instance, if you have the following directory structure and you set the proto path to proto_files/ and python_out to mypackage/proto/, the correct import line is generated, but the compiled python is put in the wrong directory.
Pre-compilation:
proto_files/
mypackage/
proto/
foo.proto # import "mypackage/proto/bar.proto";
bar.proto
mypackage/
qux/
mymodule.py # import mypackage.proto.foo_pb2
proto/
Post-compilation:
proto_files/
mypackage/
proto/
foo.proto # import "mypackage/proto/bar.proto";
bar.proto
mypackage/
qux/
mymodule.py # import mypackage.proto.foo_pb2`
proto/
mypackage/
proto/
foo_pb2.py # from mypackage.proto import bar_pb2 (the import we want! but file should be in ../../)
bar_pb2.py
This is close to the desired result, but not quite it, because now the absolute reference to the compiled file is mypackage.proto.my_package.proto.foo_pb2 rather than mypackage.proto.foo_pb2.
In this instance you can actually get it to produce the right output by specifying the python output path mypackage/. Here, the compiler detects that it doesn't need to create mypackage/proto because it already exists, and it just plops the generated files in that directory. However, this doesn't play nicely when the project directory structure makes use of symlinks. e.g. if mypackage/proto is a symlink to somewhere else and you actually want to dump the compiled protos there instead.
I think the 'correct' fix is to make use of the proto package rather than the location of the proto in the directory structure.
@DanGoldbach Thanks very much for all of the detail. I think a lot of the confusion here has been a result of not fully explaining all of the background and assumptions we are making. The more full description really helps clarify things.
Let me first respond to this:
I think the 'correct' fix is to make use of the proto package rather than the location of the proto in the directory structure.
Can you be more specific about exactly what fix you are proposing? An example would help.
One thing people seem to want, but that doesn't seem to work in practice, is that a message like this:
package foo.bar;
message M {}
...can be imported like this in Python:
from foo.bar import M
That is a very natural thing to want, but doesn't work out, as I described here: https://github.com/grpc/grpc/issues/2010#issuecomment-110495155
Overall, your directory structure appears to be more complicated than what we generally do at Google (which is the environment where all this behavior was designed/evolved). At Google we generally have a single directory structure for all source files, including protos. So we would anticipate something more like this:
Pre-compilation:
mypackage/
foo.proto # import "mypackage/bar.proto";
bar.proto
qux/
mymodule.py # import mypackage.foo_pb2
Post-compilation:
mypackage/
foo.proto # import "mypackage/proto/bar.proto";
foo_pb2.py # import mypackage.proto.bar_pb2
bar.proto
bar_pb2.py
qux/
mymodule.py # import mypackage.proto.foo_pb2
Because protobuf thinks in terms of this single, flat namespace, that's why we get a little confused when people talk about needing relative imports. I haven't wrapped my head around why this is necessary. Why doesn't the scheme I outlined above work for your use case?
Thanks, I understand much better now.
Can you be more specific about exactly what fix you are proposing?
I meant that it would be nice if the compiled proto module hierarchy mirrored the package hierarchy specified in the proto source file. As you pointed out in the grpc thread, this isn't feasible right now. Maybe in the future, the one-to-one restriction between proto sources and gens can be relaxed.
It sounds like protos work best when the generated files compile to the same directory as the source files, as per your example. Our directory structure has a separate build/ directory for generated code which isn't indexed by source control.
/build/ # generated code directory
proto/
# compiled protos go here
/python/ # parent directory for python projects
my_python_pkg/ # root of this python package
proto -> /build/proto/ # symlink to compiled proto dir
main.py # import my_python_pkg.proto.compiled_proto_pb2
We explicitly keep generated and source files separate, so your scheme doesn't suit our current repo layout.
We would also like the option of using those protos in multiple distinct Python packages in the future, so generating compiled protos into one particular Python package isn't ideal. At Google this isn't an issue because IIRC the entire repo acts like one massive Python package and blaze provides you with the bits of the repo that you need.
I think we'll get around this by either adding the compiled proto directory to our Python path or by writing a build command to manually edit the imports in the generated protos to be package-relative imports.
Hopefully this helps other people reading the thread.
Cool, glad we're getting closer to understanding the problems.
Maybe in the future, the one-to-one restriction between proto sources and gens can be relaxed.
I think this would be difficult to do. Right now we guarantee that:
$ protoc --python_out=. foo.proto bar.proto
...is equivalent to:
$ protoc --python_out=. foo.proto
$ protoc --python_out=. bar.proto
This is important because it's what allows the build to be parallelized. At Google we have thousands of .proto files (maybe even tens or hundreds of thousands of files, haven't checked lately) that all need to be compiled for a given build. It's not practical to do one big protoc run for all of them.
It's also not practical to try and ensure that all .proto files with a given (protobuf) package get compiled together. Protobuf doesn't require the file/directory to match the package, so .proto files for package foo could exist literally anywhere in the whole repo. So we have to allow that two different protoc runs will both contain messages for the same package.
So with these constraints we're a bit stuck. It leads to the conclusion that we can't have more than one .proto file put symbols into the same Python module, because the two protoc runs would overwrite the same output file.
We would also like the option of using those protos in multiple distinct Python packages in the future, so generating compiled protos into one particular Python package isn't ideal.
Usually for this case we would put the generated code for those protos into a package that multiple other packages can use. Isn't that usually the solution when you want to share code?
If you have foo/bar.proto that you want to share across multiple packages, can't you put it in a package such that anyone from any package can import it as foo.bar_pb2?
I hadn't considered the constraints placed on the proto compiler by Google's scale and parallelism requirements, but that makes sense.
I guess I can compile the protos into their own proto-only package in build/ and then import that package from wherever I need it. I think you still need to add that the parent of that package to the python path.
@DanGoldbach Thanks for your example -- it helped me solve a problem.
I think your example work as desired if you run protoc like this:
protoc --python_out . --proto_path proto_files proto_files/mypackage/proto/*.proto
It generates correct import lines and places the _pb2.py files in the correct location.
@haberman do you agreee with the suggestion of @DanGoldbach or is further explanation needed?
@Seanny123 Which suggestion do you mean? I thought we had achieved a consensus that the current behavior is workable.
His suggestion was "I think you still need to add that the parent of that package to the python path". I'm 50% sure I'm experiencing the same problem as him, which I've mentioned in this StackOverflow question.
@Seanny123 I may be misunderstanding, but is that a request of us? The protobuf library doesn't normally interfere with the Python path in any way. Are you requesting that we change the Python path from within our library?
My comment was not a request of the proto devs, it was a remark about what a developer using proto (i.e. us) has to do to get the desired behaviour. I have no further requests for the proto devs.
Whelp, in that case, thank you for your patience @haberman. You can close the issue now.
I had to read this whole thread twice to understand the resolution. For those who come after, here's a quick summary: your *.proto files need to be in a subdirectory and you need to compile from the parent directory. The name of that subdirectory becomes the name of a top-level Python package. E.g. if you protoc --python_out=whatever foo/bar.proto, then you'll have a directory foo/ and if that is in your $PYTHONPATH, then you can import foo.bar.
Also to add to @mehaase excellent summary above, you have to put the parent directory path in the relative proto imports in your *.proto files. For example, if you put everything in directory foo/ and you're importing bar.proto from baz.proto at the top you can't just do:
import "bar.proto"
you have to do:
import "foo/bar.proto"
Then compiling from the parent directory will work and the relative proto import won't hate you.
But why are relative imports not supported as a command line option? We have a git branch with grpc interface multiple projects have to implement, and each of us is adding this into its own Python tree using git submodule. And because paths are not relative, this does not work. :-(
If anyone has this issue and is up for doing some manipulation of the compiled Python scripts to get inter-module imports to work, here's a brief shell script I wrote to generate the Python scripts and make all the imports in them absolute. It presumes the directory structure in @little-dude's original issue with sibling data/ (containing .proto files) and generated/ (containing _pb2.py files) directories, but can easily be modified to work with other layouts. It basically just appends the package name to all relative imports in the generated Python scripts, in-place:
#/bin/bash -e
protoc -I data --python_out=generated data/*.proto
sed -i '.old' 's/^import \([^ ]*\)_pb2 as \([^ ]*\)$/import generated.\1_pb2 as \2/' generated/*_pb2.py
rm generated/*.old
Then you can just import bar from generated.foo_pb2 in another package and any imports of other generated Python scripts within foo_pb2.py should work as expected.
I hadn't used sed previously so the substitution command can probably be improved/optimized and I haven't thoroughly tested this outside of my own project, but hope it helps someone!
Riffing off @orn688, to capture the other form of imports and make PyCharm behave nicer.
# ./compile.sh
# Compile to Python
echo "Compiling to Python ...."
PYTHON_OUT="./python/my_root/gen"
protoc -I=./proto/ --python_out=$PYTHON_OUT ./proto/*.proto ./proto/**/*.proto
# Fix generated imports
sed -i '.old' 's/^import \([^ ]*\)_pb2 as \([^ ]*\)$/import my_root.gen.\1_pb2 as \2/' $PYTHON_OUT/*_pb2.py $PYTHON_OUT/**/*_pb2.py
sed -i '.old' 's/^from \([^ ]*\) import \([^ ]*\)_pb2 as \([^ ]*\)$/from my_root.gen.\1 import \2_pb2 as \3/' $PYTHON_OUT/*_pb2.py $PYTHON_OUT/**/*_pb2.py
rm $PYTHON_OUT/*.old $PYTHON_OUT/**/*.old
# Generate Python package __init__.py files
PKG_PATH=$PYTHON_OUT
PKGS=$(find $PKG_PATH -type dir)
for PKG in $PKGS; do
SUBPACKAGES=$(find $PKG -maxdepth 1 -type dir | egrep -v "${PKG}$" | sort)
MODULES=$(find $PKG -maxdepth 1 -iname "*.py" | grep -v "__init__.py" | sort)
MODULE_COUNT=$(echo $MODULES | wc -w)
PKG_INIT="${PKG}/__init__.py"
echo "Writing Python package exports"
echo "------------------------------"
echo "PKG: ${PKG} (${PKG_INIT})"
echo "SUBPACKAGES: ${SUBPACKAGES}"
echo "FOUND MODULES: $MODULE_COUNT"
# echo "MODULES: ${MODULES}"
echo ""
echo "__all__ = [" > "$PKG_INIT"
for MODULE in $MODULES; do
FILENAME=$(basename "$MODULE" .py)
echo " \"${FILENAME}\"," >> "$PKG_INIT"
done
for SUBPKG in $SUBPACKAGES; do
SUBPKGNAME=$(basename "$SUBPKG")
echo " \"${SUBPKGNAME}\"," >> "$PKG_INIT"
done
echo "]" >> "$PKG_INIT"
done
echo "Compiling to Python [DONE]"
# ./my_root/__init__.py
# For internal imports (from my_root.gen.pkg import Module_pb2)
import gen
# from my_root import Module_pb2 (preferred), import my_root; my_root.Module_pb2
from gen import *
# from my_root.Module_pb2 import *, import my_root.Module_pb2 (works, but confuses IDE)
import os
__path__.append(os.path.join(os.path.dirname(__file__), "gen"))
relative import should be command line option to tool. This makes building apps from a centralized proto repo difficult. Using a sed script for replacement is needless additions to production build steps.
Still waiting for a proper fix...
Update: properly structuring the files as suggested in https://github.com/protocolbuffers/protobuf/issues/1491#issuecomment-289304959 resulted in python files that I could import without issues.
Check https://github.com/protocolbuffers/protobuf/issues/1491#issuecomment-415505938
@nourchawich The fix is described in https://github.com/protocolbuffers/protobuf/issues/1491#issuecomment-289304959. Does that not work for you?
No one has proposed any change to the existing behavior that avoids breaking parallel compilation (see https://github.com/protocolbuffers/protobuf/issues/1491#issuecomment-263924909). If you have a proposal that would make life easier for some users without breaking others, I am all ears.
@haberman it works indeed! Thanks. I just had to properly structure things.
I have now a similar directory structure
my_project/ # included in PYTHONPATH
main.py # from protobufs.messages.foo_pb2 import Foo
protobufs/
__init__.py
messages/
__init__.py
bar.proto
bar_pb2.py # auto generated
foo.proto # import "protobufs/messages/bar.proto";
foo_pb2.py # auto generated
To generate python files, I excutued the following inside my_project
protoc --out_python=. protobufs/messages/*.proto
It resulted in protobufs/messages/foo_pb2.py and protobufs/messages/foo_pb2.py which I can import from my main.py without issues.
This "fix" still requires changes to PYTHONPATH. This is not really good. Proper relative imports would just work.
The "fix" is absolutely unpythonic. I'm suprised the protobuf library is so outdated. Editing PYTHONPATH is an ugly hack.
I agree that the proposed "fixes" are good workarounds, but they are not actually fixes. Could this not be solved by a new python_package option (similar to java_package) where you can specify the name of the package?
If option python_package = "my_root.gen", then the option would result in the same output as produced by @caseyduquettesc 's script.
In my case we have protos in their own project (ie. protos/src/main/...) and want to generate code for both Java and Python. Currently, I'd be forced to choose a directory name (ie. protos/src/main/my_python_namespaced_package) so that python imports work correctly. Weird that my protos directory structure should need to conform to python package namespacing when I'm trying to use it for language-agnostic purposes.
Add the same issue as everyone else here.
We manage our *.proto files in a separate project which is imported to other projects as a Git submodule.
Project1/ # Main project folder
protos/ # Git submodule folder
interface1/
__init__.py # Auto-generating code script (calls protoc)
_inter1.proto_
app/
_app.py_
And the protoc command created the _pb2_ and _pb2_grpc_ files at the same folder with the _inter1.proto_ file. This structure caused the same ModuleNotFoundError: No module named '*_pb2'
issue. Our "fix" was to add:
```import os
import sys
sys.path.append(os.path.dirname(__file__))
at the top of the__init__.py``` file. I hate it, but it works.
I can't even use the suggested hack because I have multiple projects for which I have to generate python files using protoc. Some of the message name/hierarchy might be same. If I import python files related to 1st project, the hack will add the python files location to PYTHONPATH. If I want to import python files related to 2nd project, the hack will add its path before the previous path and thus make discoverability of python files from 1st project problematic.
I postprocess generated files with:
sed -i -E 's/^import.*_pb2/from . \0/' *.py
This makes all imports relative. Works great!
@mitar I needed to explicitly create a capture group (e.g. \( and \)) and then refer to it as \1. This perhaps could be some difference in our environments. I'm running macOS.
sed -i -E 's/^\(import.*_pb2\)/from . \1/' $PY_OUT/*.py
I am using Linux and GNU sed.
Add the same issue as everyone else here.
We manage our *.proto files in a separate project which is imported to other projects as a Git submodule.Project1/ # Main project folder protos/ # Git submodule folder interface1/ __init__.py # Auto-generating code script (calls protoc) _inter1.proto_ app/ _app.py_And the protoc command created the _pb2_ and _pb2_grpc_ files at the same folder with the _inter1.proto_ file. This structure caused the same ModuleNotFoundError: No module named '*_pb2'
issue. Our "fix" was to add:import sys sys.path.append(os.path.dirname(__file__))at the top of the
__init__.pyfile. I hate it, but it works.
to solve import error i have added below line __init__.py of module within which i am expecting the generated code.
sys.path.append(os.sep.join(abspath(__file__).split(os.sep)[:-1]))
@haberman I'm having issues with the https://github.com/protocolbuffers/protobuf/issues/1491#issuecomment-289304959 solution. I am generating C++ and python code from the same .proto files.
If I use the mentioned comment's structure, the Python works but the C++ autogenerated code fails to compile. This is what my setup looks like:
cpp_source/
- MyClass.cpp #include "proto/bar.pb.h"
- MyClass.hpp
- Makefile
- proto/
- bar.proto # import "proto/foo.proto" [BAD]
- bar.pb.cc #include "proto/bar.pb.h" [BAD]
- bar.pb.h #include "proto/foo.pb.h" [BAD]
- foo.proto
- foo.pb.cc
- foo.pb.h
...
python_package/
- my_module.py # import python_package.proto.bar [OK]
- proto/
- bar_pb2.py # from proto import foo_pb2 as proto_dot_foo__pb2 [OK]
- foo_pb2.py
I am compiling the proto files on the makefile by doing:
protoc --python_out=/path/to/python_package/ --cpp_out=. proto/*.proto
As you can see, the includes in the autogenerated C++ code are bad.
Originally, when I was only generating C++ code, the imports inside the .proto files DID NOT have the proto/ prefix and I was compiling the proto files by:
protoc --proto_path=proto --cpp_out=proto proto/*.proto
which resulted in:
cpp_source/
- MyClass.cpp #include "proto/bar.pb.h"
- MyClass.hpp
- Makefile
- proto/
- bar.proto # import "foo.proto"
- bar.pb.cc #include "bar.pb.h"
- bar.pb.h #include "foo.pb.h"
- foo.proto
- foo.pb.cc
- foo.pb.h
Another frustration we have ran into is that the Python protobuf generator treats filenames as part of the contract
If you have a file named my-stuff.proto then the my-stuff is used in generation, instead of just using the types and the package and other information actually contained in the protobuf contract itself to generate the module.
@haberman I'm having issues with the #1491 (comment) solution. I am generating C++ and python code from the same .proto files.
If I use the mentioned comment's structure, the Python works but the C++ autogenerated code fails to compile. This is what my setup looks like:
cpp_source/ - MyClass.cpp #include "proto/bar.pb.h" - MyClass.hpp - Makefile - proto/ - bar.proto # import "proto/foo.proto" [BAD] - bar.pb.cc #include "proto/bar.pb.h" [BAD] - bar.pb.h #include "proto/foo.pb.h" [BAD] - foo.proto - foo.pb.cc - foo.pb.h ... python_package/ - my_module.py # import python_package.proto.bar [OK] - proto/ - bar_pb2.py # from proto import foo_pb2 as proto_dot_foo__pb2 [OK] - foo_pb2.pyI am compiling the proto files on the makefile by doing:
protoc --python_out=/path/to/python_package/ --cpp_out=. proto/*.protoAs you can see, the includes in the autogenerated C++ code are bad.
Originally, when I was only generating C++ code, the imports inside the .proto files DID NOT have the proto/ prefix and I was compiling the proto files by:
protoc --proto_path=proto --cpp_out=proto proto/*.protowhich resulted in:
cpp_source/ - MyClass.cpp #include "proto/bar.pb.h" - MyClass.hpp - Makefile - proto/ - bar.proto # import "foo.proto" - bar.pb.cc #include "bar.pb.h" - bar.pb.h #include "foo.pb.h" - foo.proto - foo.pb.cc - foo.pb.h
you can separate the generation into two commands which uses different root/out folders.
I postprocess generated files with:
sed -i -E 's/^import.*_pb2/from . \0/' *.pyThis makes all imports relative. Works great!
My group had to got a few steps beyond that.. to handle the build of multiple proto files and then do the replacement in a series of subfolders... we haven't quite nailed it down i'm sure some bash master might gag but but it's working for now.
|__ definitions ___
|__ proto_folder_1 -> proto_folder_1.proto
|__ proto_folder_2 -> proto_folder_2.proto
|__ gen __
|__ proto_folder_1 -> proto_folder_1_pb2.py, proto_folder_1_pb2_grpc.py
|__ proto_folder_2 -> proto_folder_2_pb2.py, proto_folder_2_pb2_grpc.py
# Re-build protocol
#
echo ""
echo "---------------------Generating protocols----------------------"
for D in definitions/*; do
if [ -d "${D}" ]; then
echo "Processing folder $D:"
serviceName="${D#definitions/}"
outDir="gen"
for file_name in $D/*; do
proto_file="${file_name}"
echo "------ converting $proto_file --> $outDir"
if [ ! -d $outDir ]; then
mkdir -p $outDir;
fi
python -m grpc_tools.protoc --python_out=$outDir --grpc_python_out=$outDir $proto_file --proto_path definitions
temp="${file_name#$D/}"
no_extension="${temp%.*}"
gsed -i -r "s/(from\s)(.*)/from ..\2/g" "gen/$serviceName/${no_extension}_pb2_grpc.py" || echo "Broke on replacement relative might be nothing to do"
done
else
serviceName="${D#definitions/}"
extension="${serviceName##*.}"
if [ "${extension}" = "proto" ]; then
file_name="${serviceName}"
outDir="gen"
moduleDir="gen"
proto_file="${file_name#$D/}"
echo "------ converting $proto_file --> $outDir"
if [ ! -d $outDir ]; then
mkdir -p $outDir;
fi
python -m grpc_tools.protoc -I definitions --python_out=$outDir --grpc_python_out=$outDir $proto_file
temp="${file_name#$D/}"
no_extension="${temp%.*}"
gsed -i -r "s/(from\s)(.*)/from ..\2/g" "gen/${no_extension}_pb2_grpc.py" || echo "Broke on replacement relative might be nothing to do"
fi
fi
done
echo "------------------------------Done------------------------------"
echo ""
with grpcio-tools 1.22.0 and Python 3.7.2, I still have this issue, i.e. grpc generated import line does not work due to relative imports.
Sorry this thread is so long, what was the final resolution?
So is this fixed or what are the correct workarounds not involving modifying the generated files?
import YsAppStatusService_pb2 as YsAppStatusService__pb2 causes ModuleNotFound error with Python 3.
This is not fixed. Not sure why it is closed.
Just to share that, the following comment from another similar thread seems to be a good answer, https://github.com/grpc/grpc/issues/9575#issuecomment-293934506 . It works for me for now. Still think protoc can be improved regarding this issue.
This are not relative imports. If you use generated code as a git submodule, for example, you need relative imports. The best solution I know of is running little postprocessing.
Reading the Packages section of this page: https://developers.google.com/protocol-buffers/docs/proto3 specifically the python section:
In Python, the package directive is ignored, since Python modules are organized according to their location in the file system.
That's all good but it ignores half the problem. Yes, python modules are organized in that manner, but their location affects the import statements. IMO, if you have a package specified in your proto file, it should create a package directory inside the directory provided to the python_out arg. For example, this command:
protoc --python_out=generated ./messages.proto
should result in this dir structure:
- package_name
- messages_pb2.py
the imports would then use the package name:
import package_name.messages_pb2
edit: I did just stumble on the fact that the import statements will reflect the directory structure of the .proto files. This seems like the concept of packages is being handled in two ways - one from the proto dir structure, and one from the package directive in the proto language.
foo
a.proto # package foo and has import "bar/b.proto"
bar
b.proto # package bar
the generate code only work in root dir, for example:
a_pb2.py
bar
b_pb2.py
main.py # from foo import a_pb2
but I want put generated code in libgen, but this not works.
foo
a_pb2.py
bar
b_pb2.py
main.py # from libgen.foo import a_pb2
the result error is "missing module bar", a way to fix the generated .py code in b_pb.py should using
from ..foo XXXX # the generated is from foo
instead.
Is there any special params to specify when generate python?
Just in case anyone is looking for solutions, here's a project that does the re-rooting of the generated bindings and also additionally provides a way to generate python class wrappers for protobuf: https://github.com/andreycizov/python-protobuf-gen
I just hit this as well in a project that currently supports python2/3. The import is broken for py3 because of the import statement, after being generated from a proto file in another part of the project.
Why does the solution have to involve mucking around with the path hierarchy? Can the python proto plugin not support an option like --python_out=import_from=myApp.sub:out ?
@justinfx Google has stated multiple times that if someone likes to have this functionality they are free to submit pull requests. Nobody did that yet, as I believe?
Still stuck on this, any clues on where to start to build a PR for this?
@andreycizov fair enough. I must have missed that point and only focused on the majority of replies that talked about the path hierarchy as the solution. I ended up symlinking my proto file from the other part of the project to the python location and then using the includes/proto path combination that produces the right output.
I'm using this solution. Add an __init__.py to the directory with your protobufs and then add the following code:
import sys
sys.path.append("./pb") # this should be the directory of the generated files relative to where you're running it from
The relative path worked for me until I called my code from a different directory. This variation using the full path worked for me.
# pb/__init__.py
import sys
from pathlib import Path
sys.path.append(str(Path(__file__).parent))
The relative path worked for me until I called my code from a different directory. This variation using the full path worked for me.
# pb/__init__.py import sys from pathlib import Path sys.path.append(str(Path(__file__).parent))
So far, this is the best fix as it is independent from how I name my protobuf submodule or the name of the symbolic link to that submodule.
I used the sys.path modification in an __init__.py method for a while as well but it always seems too hacky. If you're using a recent version of Python (3.7+) I've got a new protoc plugin I've been working on that solves this issue by using relative imports:
https://github.com/danielgtaylor/python-betterproto
It will also take care of generating __init__.py files as needed so imports will work.
Just as a note, here's a small sed command I use to alter the imports of the generated files
sed -i -r 's/import (.+_pb2.*)/from . import \1/g' *_pb2*.py
In my case we're using gradle as our build system to manage a multi-language project which uses both the java & python generated protos. This copy task fixes the imports when copying all generated pb2 files from a given dir:
task copyGeneratedProtos(type: Copy) {
from "${project(':my-project-api').projectDir}/generated_src/main/python"
into "${projectDir}/python/"
filter { ln ->
if (ln ==~ 'import (.+_pb2.*)') {
return ln.replace('import', 'from . import')
}
return ln
}
}
We implemented it this way since we didn't want to account for slightly different sed behavior on MacOS vs Linux
We solved it with these two steps after the files are generated in the build/ folder:
touch build/__init__.py
2to3 build/ -w -n
It does all the conversions (import fixing most of the time) to make it compatible with Python 2 & 3.
We wanted to package the generated files to use in many python projects. I ended up with something like this.
โโโ MANIFEST.in
โโโ README.md
โโโ requirements.txt
โโโ setup.py
โโโ src
โโโ main
โโโ proto
โโโ common
โย ย โโโ foobar.proto
โโโ service_a.proto
โโโ a
โย ย โโโ a_bar.proto
โย ย โโโ a_baz.proto
โย ย โโโ a_box.proto
โโโ b
โย ย โโโ b_box.proto
โโโ service_b_grpc.proto
โโโ service_b.proto
โโโ utility.proto
And a setup.py as follows:
import os
from os.path import isdir, join
from pathlib import Path
from setuptools import find_packages, setup
from setuptools.command.develop import develop
cwd = os.path.dirname(__file__)
proto_src_dir = os.path.join(cwd, 'src/main/proto')
def build_protos():
import grpc_tools.protoc
proto_files = list(Path().rglob('*.proto'))
grpc_tools.protoc.main([
'grpc_tools.protoc',
'-I',
proto_src_dir,
'--python_out=.',
'--grpc_python_out=.'
] + [str(p) for p in proto_files])
proto_packages = [name for name in os.listdir(proto_src_dir) if isdir(join(proto_src_dir, name))]
for package in proto_packages:
with open(join(cwd, package, '__init__.py'), 'w') as file_:
file_.write('')
class develop_protos(develop):
def run(self):
build_protos()
develop.run(self)
with open(cwd, 'VERSION')) as version_file:
version = version_file.read().strip()
install_requires = [x for x in open('requirements.txt').readlines() if not x.startswith("#")]
setup(name='my_protos',
version=version,
license='None',
cmdclass={
'develop': develop_protos
},
packages=find_packages(),
install_requires=install_requires,
py_modules=['service_a', 'service_b_grpc', 'service_b'],
python_requires='>=3.6, <4',
include_package_data=True)
To generate the sources run python setup.py develop then to package run python setup.py sdist.
The cruft:
from service_a import TempProbeRunnerServicer instead of from my_protos.service_a import TempProbeRunnerServicer... this isn't a problem for me. src/main/proto also requires you to add it to the py_modules in setup.py... Probably this can be improved but at this point I just don't care anymoreThis issue seems to be closed without a decent solution? (I don't think using sed or 2to3 is a good solution). Is there a fix for this yet cause I can't seem to find it and this issue has been here for almost 4 years.
I think this issue should be kept open as a feature request to add an optional argument like python_import_root that would enforce import structure:
# relative :
protoc -I=proto/foo.proto --python_out=generated/proto --python_import_root="."
# absoulute:
protoc -I=proto/foo.proto --python_out=generated/proto --python_import_root="generated.proto.foo"
So the import structures become:
# relative
from . import bar_pb2
# absolute:
from generated.proto.foo import bar_pb2
So is the official recommendation for using protobufs in a python project currently to use 2to3 or sed to fix the generated files? It does work, but it doesn't seem like a great solution. Would google accept a PR that added an option like --python_import_root?
_I wanted my .py files to be written to lib/generated, and be accessible with from lib.generated import ***; without modifying the generated python code, or modifying the proto file structure._
I worked around this issue by copying the *.proto files into a tempory folder with the correct hierarchy, then generating target files from to the base of the hierarchy, and outputting to .
protos/
a.proto
b.proto
pythonclient/
lib/
generated/
# files should be generated here
main.py
# build.sh
SOURCE=$1 # source of the proto files
TARGET=$2 # where to generate the python files
TMPDIR=$(mktemp -d)
mkdir -p $TMPDIR/$TARGET
cp $SOURCE/* $TMPDIR/$TARGET
python -m grpc_tools.protoc --python_out=. --grpc_python_out=. -I $TMPDIR $TMPDIR/$TARGET/*.proto
rm -rf $TMPDIR
```bash
./proto.sh ../protos lib/generated
```python
# main.py
from lib.generated.a_pb2 import MessageA
MessageA(name='foo')
Since this thread has plenty of interest, I wanted to point out #7470 as a possible solution to some of the needs expressed here. It would be great to have some eyes on that issue to get some feedback.
Hi all, I follow the sugestion from @chdsbd
โโโ apps
โโโ myproject_1
โโโ src
โโโ myproject_1
โโโ proto
โโโ myproject_1
โโโ v1
โโโ __init__.py
โโโ a_bar_pb2_grpc.py
โโโ a_bar_pb2.py
โโโ __init__.py
โโโ myproject_2
โโโ v1
โโโ __init__.py
โโโ b_bar_pb2_grpc.py
โโโ b_bar_pb2.py
โโโ __init__.py
โโโ __init__.py
โโโ source_protos
โโโ myproject_1
โโโ v1
โโโ a_bar.proto
โโโ myproject_2
โโโ v1
โโโ b_bar.proto
in a_bar_pb2_grpc.py , I have from myproject_1.v1 import something as somenthing_2
in apps/myproject_1/src/myproject_1/proto/__init__.py :
import sys
import os
sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)))
However when I run , I get this error in a_bar_pb2_grpc.py and a_bar_pb2.py : ModuleNotFoundError: No module named myproject_1.v1
this is sys.path: ['apps/myproject_1/src', ...]
when I append os.path.abspath(os.path.dirname(__file__)) to the sys.path = ['apps/myproject_1/src/myproject_1/proto','apps/myproject_1/src', ...]. However I still get the error, if I remove apps/myproject_1/src from sys.path I got ModuleNotFoundError again but from project_1.proto. Any ideas to fix it?
For those searching a workaround (until #7470 is merged) as build script in setup.py:
import glob
import re
from grpc_tools import protoc
# Generate stubs from `api/*.proto` files in `api/gen` directory
protoc.main([
'grpc_tools.protoc',
'--proto_path=api',
'--python_out=api/gen',
'--grpc_python_out=api/gen'
] + [proto for proto in glob.iglob('./api/*.proto')])
# Make pb2 imports in generated scripts relative
for script in glob.iglob('./api/gen/*.py'):
with open(script, 'r+') as file:
code = file.read()
file.seek(0)
file.write(re.sub(r'(import .+_pb2.*)', 'from . \\1', code))
file.truncate()
Since it's a Python script, it should be unequivocally cross-platform.
@vintprox I would add one small tweak to the code above as it will break imports for Well Known Types (WNT).
The following is generated when using a Timestamp in a message
from google.protobuf import timestamp_pb2 as google_dot_protobuf_dot_timestamp__pb2
by adding a \n to the regular expression, it'll ensure the import for the timestamp won't get matched.
import glob
import re
from grpc_tools import protoc
# Generate stubs from `api/*.proto` files in `api/gen` directory
protoc.main([
'grpc_tools.protoc',
'--proto_path=api',
'--python_out=api/gen',
'--grpc_python_out=api/gen'
] + [proto for proto in glob.iglob('./api/*.proto')])
# Make pb2 imports in generated scripts relative
for script in glob.iglob('./api/gen/*.py'):
with open(script, 'r+') as file:
code = file.read()
file.seek(0)
file.write(re.sub(r'\n(import .+_pb2.*)', 'from . \\1', code))
file.truncate()
Thank you for the script!
Just as a note, here's a small sed command I use to alter the imports of the generated files
sed -i -r 's/import (.+_pb2.*)/from . import \1/g' *_pb2*.py
This almost did the trick for, except that it messed up some of my other from ... import ... imports. A minor modification fixes this:
sed -i -r 's/^import (.+_pb2.*)/from . import \1/g' *_pb2*.py
I have a simpler sed -i -E 's/^import.*_pb2/from . \0/' *.py.
As the PR(#7470) is still not merged, the best workaround for now is just adding from . in front of import statements, right?
Is there any clean way to have the .proto files, and generated GRPC files in separate directories, under the main app yet?
Currently I am just dumping all the generated code in my root directory just to avoid this import nightmare
@ckcr4lyf I used to mess with getting the paths correct for the protoc command, but now I just found it easier to rewrite the generated source files as a step in my build system. Easier than trying to anticipate the output location vs the import paths.
Yeah, seems like its just hacks for now. Hope there's a better solution soon ๐
@justinfx and @ckcr4lyf, happy to have your thoughts on @tpboudreau's #7470 if you get a chance.
Will check it out (literally checkout) and see if it works as expected, thanks
Most helpful comment
I had to read this whole thread twice to understand the resolution. For those who come after, here's a quick summary: your
*.protofiles need to be in a subdirectory and you need to compile from the parent directory. The name of that subdirectory becomes the name of a top-level Python package. E.g. if youprotoc --python_out=whatever foo/bar.proto, then you'll have a directoryfoo/and if that is in your$PYTHONPATH, then you canimport foo.bar.