Pyinstaller: multipackage (MERGE) boken in PyInstaller 3.0

Created on 26 Sep 2015  ยท  39Comments  ยท  Source: pyinstaller/pyinstaller

In PyInstaller 3.0 no much attention was paid to 'multipackage' feature and it is broken:

  • MERGE should be fixed
  • MERGE tests should be migrated to pytest
@high feature pull-request wanted

Most helpful comment

Anybody still interested please take a look at #4303, it is a much smaller change than #2416.

All 39 comments

Maybe a chance to rework this. PyInstaller supports loading modules from a external file, see old_suite/basic/test_pyz_as_external_file.spec and bootloader/main. We could change MERGE into using this and dramatically simplify the code.

Edit: This test-case has now been moved to test_pyz_as_external_file in functional/test_basic.py: py.test -k test_pyz_as_external_file.

I'm interested in this feature, and I might be able to help with fixing it. I did a very simple test (just an import and a print), and it seems not to be duplicating the libs, so I don't see exactly where it could be broken.

samsara ๆˆ ~/tmp/pyinst-MERGE/dist 
10712 โ—ฏ : tree                                                                                                                                                                      
.
โ”œโ”€โ”€ bar
โ”‚ย ย  โ””โ”€โ”€ bar
โ”œโ”€โ”€ baz
โ”‚ย ย  โ””โ”€โ”€ baz
โ””โ”€โ”€ foo
    โ”œโ”€โ”€ bar
    โ”œโ”€โ”€ baz
    โ”œโ”€โ”€ bz2.x86_64-linux-gnu.so
    โ”œโ”€โ”€ _codecs_cn.x86_64-linux-gnu.so
    โ”œโ”€โ”€ _codecs_hk.x86_64-linux-gnu.so
    โ”œโ”€โ”€ _codecs_iso2022.x86_64-linux-gnu.so
    โ”œโ”€โ”€ _codecs_jp.x86_64-linux-gnu.so
    โ”œโ”€โ”€ _codecs_kr.x86_64-linux-gnu.so
    โ”œโ”€โ”€ _codecs_tw.x86_64-linux-gnu.so
    โ”œโ”€โ”€ _ctypes.x86_64-linux-gnu.so
    โ”œโ”€โ”€ foo
    โ”œโ”€โ”€ _hashlib.x86_64-linux-gnu.so
    โ”œโ”€โ”€ _json.x86_64-linux-gnu.so
    โ”œโ”€โ”€ libbz2.so.1.0
    โ”œโ”€โ”€ libcrypto.so.1.0.2
    โ”œโ”€โ”€ libexpat.so.1
    โ”œโ”€โ”€ libffi.so.6
    โ”œโ”€โ”€ libpython2.7.so.1.0
    โ”œโ”€โ”€ libreadline.so.6
    โ”œโ”€โ”€ libssl.so.1.0.2
    โ”œโ”€โ”€ libtinfo.so.5
    โ”œโ”€โ”€ libz.so.1
    โ”œโ”€โ”€ _multibytecodec.x86_64-linux-gnu.so
    โ”œโ”€โ”€ pyexpat.x86_64-linux-gnu.so
    โ”œโ”€โ”€ readline.x86_64-linux-gnu.so
    โ”œโ”€โ”€ resource.x86_64-linux-gnu.so
    โ”œโ”€โ”€ _ssl.x86_64-linux-gnu.so
    โ”œโ”€โ”€ termios.x86_64-linux-gnu.so
    โ”œโ”€โ”€ twisted.python._sendmsg.x86_64-linux-gnu.so
    โ””โ”€โ”€ zope.interface._zope_interface_coptimizations.x86_64-linux-gnu.so

3 directories, 32 files
(env2) 

(using d9be7da due to #1863 affecting me at least on my local enviroment)

@htgoebel could you please elaborate a bit more on the needed rewrite when you have some time?

@kalikaneko Well, the tests (which are still in the old test-suite only) all fail, Try:

cd tests/old_suite/
./runtests.py multipackage/test_*.py

My idea for rewriting it is: Instead of appending the PKG to one of the executables, create an external file pkg which is used. The bootloader already contains code for this. But now as I'm checking the code, I'm no longer convinced that this will be a notable improvement.

Selected PyInstaller based on this feature. Disappointed to see it not working. What to do?

Not working still?
Now my app consists of two simple executables that build with pyinstaller so horrible large.

Please add this to your todo list.

I just wanted to say that this works for me, given a small hack.

For some time I was trying to fork+exec my own application, to launch some workers due to inherent issues with multiprocessing on OS X (not pyinstaller related). To prevent unpacking overhead, I would do something like this::

def pyinstaller_env():
    """
    Returns None if not in pyinstaller. (None means: Keep current environment,
    at least to Popen)

    Otherwise returns the current environment, plus the _MEIPASS2 environment
    variable, provided to ensure Pyinstaller doesn't unpack itself again when
    we exec() it.

    For multipackage bundles, this is required for the multipackage bundles to
    function at all.
    """
    if hasattr(sys, 'frozen'):
        env = os.environ
        # If Pyinstaller encounters _MEIPASS2 in the env, it will not unpack
        # and instead use that directory for the bootloader. By using this we
        # avoid unpacking again for every fork+exec.
        env['_MEIPASS2'] = sys._MEIPASS

        return env

    return


def fork_self(args):
    """ Fork self. Possibly using a different entry point (args)
    """
    print('fork_self:', args)
    env = pyinstaller_env()

    main = subprocess.Popen([sys.argv[0]] + args, env=env)

    return main

However, the second (smaller) application does not start, with an error like this:
Error loading Python lib '/var/folders/sy/ky6wtvhs3yx_b1856h3g8w6r0000gn/T/_MEIk5dZOT/.Python': dlopen(/var/folders/sy/ky6wtvhs3yx_b1856h3g8w6r0000gn/T/_MEIk5dZOT/.Python, 10): image not found

But, using my pyinstaller_env() function when spawning the second application does work.

Maybe this is useful for others - if you fork+exec to start one of your other MERGE'd + BUNDLE'd application, just pass _MEIPASS2 (with value set to the current _MEIPASS)

(Background: my fork+exec worker combo worked fine, except that it would spawn lots of dock icons, due to the main application being a windowed application, and pyinstaller setting up every time. That's why I created a second binary, which is otherwise identical, except that it has console=True, instead of console=False.)

Would this be suitable as a Wiki recipe?

I don't get any hits on a search in the repository for _MEIPASS2 so I wonder where in the bootloader this is tested?

_MEIPASS2 is mentioned here - https://github.com/pyinstaller/pyinstaller/wiki/Recipe-Multiprocessing

I didn't find it in the documentation either, but that recipe made me try the same on OS X/Linux, and it also seems to work there.

_MEIPASS2 is found here: bootloader/src/pyi_main.c

I don't know if this is the right 'solution', so I don't think I am qualified to comment on whether this counts as a recipe or not.

Yeah, it's confusing. I'm confused because github search still doesn't turn up that pyi_main.c hit in the 41 results for '_MEIPASS2'. But you have fingered the place it matters, here. The bootloader starts, fetches that envar, and at line 122 makes a critical decision.

If _MEIPASS2 was not defined, this is the original bootloader thread and it starts the process of creating the temp folder and unpacking into it. When that completes it will fork itself, set _MEIPASS2 and we come back to line 122 now in a second thread. Because the envar is set, it knows this is the second instance, unpacking is complete, and it can proceed to kick off the unpacked user program.

So by pre-setting the envar and then launching the same executable in a subprocess, you are preventing the bootloader at the head of that executable from going through the unpacking-and-forking business. It just proceeds into the user code in its subprocess thread.

If you made a subprocess out of a different executable it would be a mistake to set that envar because that different executable would not unpack itself, would try to import modules from the parent program's temp folder and would likely fail with an import error.

The multiprocessing recipe you point to seems to be doing this same thing. (The same code appears in this functional test and this one.

However it looks odd to me for a couple of reasons, one is this:

if sys.platform.startswith('win'): ... finally: ... # On some platforms (e.g. AIX) 'os.unsetenv()' is not # available.

Since the code is restricted to Windows, why is it nattering about AIX?

Yes, that is indeed what _MEIPASS2 does. However - your note on 'different executable' is not entirely correct. In this case (MERGE target), only one of the binaries (first one passed to MERGE) contains all of the dependencies. If that one unpacks first, and then the others are started with that _MEIPASS2, it just seems to work.

The multiprocessing recipe mentions Windows specifically because multiprocessing on Windows, using python2, can't do fork+exec (because it is Windows) and you need the hacks that they mention in the recipe. Why they mention AIX is beyond me. It may just be general coding practice / a habbit regarding os.unsetenv. Multiprocessing on python3 will likely require similar hacks/recipes for all platforms.

@MerlijnWajer Some tests fail unexpectedly at the moment because the loader doesn't setup the environment correctly for the parent thread sometimes. You can see a test failure here, but they have also occurred on linux. Can you guess what might be causing this?

Are you sure that this is related to the MERGE target? I haven't looked at pyinstaller tests before, so I don't know offhand what is going wrong there. It's also Windows, which I don't usually test pyinstaller on.

@htgoebel - any comments on my previous notes on this? Could this be related to the broken tests? I guess I should perhaps dive into the bootloader code and see if this can be fixed/changed, but I don't know what the intended behaviour is.

@MerlijnWajer See #2371.

@xoviat: Just to check, I don't see how this relates to the MERGE issue?

I don't think it does now. Originally I thought that it was related to the bootloader, but @htgoebel has come up with a more plausible explanation.

I may look at this after #2341 is merged.

I've made some progress at github.com/xoviat/mergemodule. The main roadblock now is that Using the single package approach, the bootloader is unable to determine the correct script to start for both executables.

Unfortunately, It appears that I will have to backtrack because append_pkg cannot be used to create a package that an executable can distinguish the specific scripts to load.

Cannot you please restore the feature? It is really useful for us.

We develop a Python package and have about 10 demo Python scripts that show its capabilities.
We want to distribute the demo scripts with PyInstaller but we want to put all 10 generated exe files into same folder.

This would be a really useful feature; it is the only thing that stops me moving from py2exe to pyinstaller.

I recently discovered PyInstaller; it works very well, thanks! It seems that my program also needs this feature. I'm not sure that I would understand the codebase well enough to contribute patches, but I would be happy to help test out any solutions on which you may be working.

(I noticed the mention of subzero in another issue, and will check it out, but it would be great to see this feature restored to solve the problem fully.)

I don't know pyinstaller well enough to know if this is a bad idea, but if I just copy the second executable into the directory of the first they both work with the same shared library. I'm using Python 3.5.1, Pyinstaller 3.2.1 and MacOS 10.12.5. It at least works for this environment.

For executables with non-conflicting dependencies this might be an option.

This requires larger changes to the bootloader and will not make it into release 3.3.

Just discovered this feature in the docs. I'm building a multi-program bundle using PyInstaller by having one spec file for each program and building them individually. When done I simply copy all the files into the same directory (seems to be what @Terrabits is doing as well). The project is ActivityWatch.

Haven't had any issues with it, so I'm wondering what the practical differences are.

The idea behind MERGE mostly applies to onefile executables.

With onedir exes, all of the Python bytecode is archived into the main executable, while the rest of the app dependencies (including data files and shared libraries) are stored in the executable's folder. This is why it often works to just merge two onedir exes file-wise.

With onefile exes, every single file is archived into the main exe (and extracted to a temp folder at runtime). Putting two of these exes in the same folder means that there will be many files that are archived twice, with one copy in each exe. What MERGE allows is to have the first exe contain all of the dependencies, and the second exe just have references to the first one instead of second copies of, for example, python.dll.

Please fix it, it can be really important

If one needs this, please provide a pull request or sponsor a project grant.

@htgoebel Could you comment on the larger changes required to the bootloader?

Are you a C programmer? Look at the PRs I submitted and fix them.

Yeah, C is no problem. Can you link me to the relevant PRs? You have rather a lot. :)

See gh-2416. Note that you'll need to extract the relevant changes manually (ugh!) and then clean them up so that just the bootloader is modified. In addition, you should try to avoid moving functions around (as I did to try to make up for my poor C knowledge at the time I submitted the PR).

Hi, just wondering if there's any progress that was made on this issue? It seems that #2416 was abandoned and the user was deleted.

2416 is huge, and nobody started a new approach.

@htgoebel I don't understand the specifics of the bootloader change, but I could implement some of your comments in #2416. Would you consider merging separate PRs for bootloader changes / the rest of the changes (tests, etc)? I can handle pytest changes well.

Anybody still interested please take a look at #4303, it is a much smaller change than #2416.

Wow, awesome work on such a long standing issue @coreydexter! That is a much simpler change compared to #2416. Looks like all tests are passing except for the Python: nightly env (only because there's no version of numpy available to install in python3.8-dev).
:tada: :tada:
Hope the core devs can take a look at the PR soon :smile:

Awesome work!
While we're waiting for the devs: is this problem specific to the COLLECT step? I seem to be having success if I simply specify two different folders to COLLECT the exes into, and then just merge the folder manually. I.e.

    MERGE((a1...), (a2...))
    pyz1 = PYZ...
    pyz2 = PYZ...
    e1 = EXE(pyz1...)
    e2 = EXE(pyz2...)
    COLLECT(e1, ..., name="dir1")  # main folder with all dependencies
    COLLECT(e2, ..., name="dir2")   # the second exe + some dependencies
    # move contents of dir2 into dir1
    # profit?

Unless I missed something...

@jjuod We used to do this for ActivityWatch but as we updated our Windows CI to use PyInstaller 3.5 (instead of 3.3.1) all hell broke loose, so now we're not doing that anymore.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

embryo10 picture embryo10  ยท  50Comments

AlexFDias picture AlexFDias  ยท  37Comments

lsoica picture lsoica  ยท  38Comments

htgoebel picture htgoebel  ยท  47Comments

sean0x42 picture sean0x42  ยท  35Comments