Seeing as v0.1.1 is the current release, and it's models and audio (+ more) are readily available on github (the release page), the taskcluster entry for that commit should stick around so people can easily download the prebuilt binaries (native_client.tar.gz, etc) for that version.
It seems like taskcluster has removed them, and this is very annoying for people trying to get started with the models available at the moment. I can't seem to find these files anywhere else, and there is no Dockerfile for that version, so it's almost prohibitively difficult to get started due to the specific (and at this point, outdated) versions of the dependencies on the various platforms.
Could the prebuilt binaries for (at least) the current release be made available again? I think this would help others a lot as well, as I don't imagine that everyone that wants to use this library is building from source. I would use the alpha version, but there are no models available yet, and this problem would like occur eventually for that future release as well.
UPDATE: I found the version I want, but the commit hash changed in task cluster, and there's still no indication that it won't be deleted.
You're right. TaskCluster should not be used for long-term keep, though, but we have not figured a good solution yet.
I guess a better solution would be to rely on github release for that, but that requires some work.
Python and Node packages of v0.1.1 are still hosted and available through pypi and npm. It is also safe to use current alpha binaries with 0.1.1 model files.
Any idea @reuben or @kdavis-mozilla for proper long-term hosting?
Also, what was the specific pain point leading you to require some older build hard to find?
As far as I can tell, up until (somewhat recently) when the change went into "revert "Added quntized array language model and trie" ", the only version of the prebuilt binaries that would work with the latest models (from v0.1.1) were from v0.1.1. Therefore, because I want to use the models, I had to use v0.1.1.
And that's fine, I'll use an older release if the models are readily available. Until I started looking deeper into the "missing" prebuilt binaries (ie native_client etc), I thought that v0.1.1 was my only option for leveraging the aforementioned models, so naturally I was surprised when the taskcluster link I had previously relied on for v0.1.1 was "gone" (turns out that it changed, by my scripts/macros/aliases didn't know the difference).
I understand that I can (theoretically, as you said - I just haven't tried it yet) use newer versions of the prebuilt binaries on the older models. I assume that's why you reverted the commit I linked to above. However, that being said, it shouldn't be this involved for the end user to figure out which versions of the models and binaries to mix/match in order to get things working.
Thanks for your quick response and hopefully this process can be improved. I'm still curious though, is there any rule that keeps the current taskcluster artifacts? Seems like they may disappear, but maybe I'm missing something. If older entries are going to get deleted, I suppose I'll start archiving them for my own use...
Quick answer because it's late and I'm officially on holiday. The issue you mention is just about the language model, we knew and documented how to do in the meantime.
Regarding TaskCluster, we do control the expiration of artifacts, its 7 days for a pending PR and 6 months for a merge. We thought this would be good enough for now, looks like we need to think it a bit more now, thanks for raising that.
Also you're right about the end user should not have to search that much, this is just an unfortunate collision, so feedback is welcome because we get our heads too much inside the way the project works and we might miss unclear user path, especially when we have incompatible alpha builds with the released model.
I'll check if we can easily retrigger a rebuild of 0.1.1 binaries.
@lissyx Thanks for the clarification. I ended up finding a prebuilt tarball for something close to v0.1.1's release, but I'll give the newer stuff a shot - should work fine given the compatibility with 0.1.1's models.
For keeping older versions, I was thinking something along the lines of a simple archive of current/past built binaries for all platforms, as well as link(s) to/copy of their corresponding compatible models. Many package managers on various platforms do this in the form of package mirrors. An example of this would be MIT's Arch pkg mirror. One of the main reasons that I think keeping copies of resources from releases around is important is that LTS pkgs and distros can reference those resources indefinitely. Users can also easily browse release history and grab whatever archives they need, and having them all in once place would be super convenient. I would propose (as I just alluded to) keeping those resources around for either a very long time, or preferably forever, as prebuilt binaries are small, and larger artifacts like models and audio samples don't change too much. Plus, all said, that data doesn't size up to too much. Pinging @reuben & @kdavis-mozilla here for their thoughts on such a solution (and this problem in general).
I would be happy to help assist such an effort either through new or existing donations, or in terms of my own time.
@lissyx - Enjoy your vacation.
Yes, the problem is not if we should do it, it's to find time to do it in a future proof way and how we should do it.
@9define Back in the 0.1 era, we had no way to securely perform uploads for example. We now have that, and we upload to NPM and PyPi. I guess we could just add an upload of everything (native_client.tar.xz, python wheels and npm packages) to the Github tag ?
Would that work for you @9define ? Any opinion on that @reuben @kdavis-mozilla ?
Yep, I think that is an ideal solution.
That would be great! Hopefully all artifacts could be uploaded for all platforms. I'm not sure if it would be possible, but doing so for existing releases (ie retroactively, redoing older builds) would be optimal, especially as newer releases of the native client break compatibility with the most recent models (ie v0.1.1's, at the moment). Thanks @lissyx & @reuben for your responsiveness.
In the meantime, I've started rebuilding some 0.1.1: first the tensorflow bits https://tools.taskcluster.net/groups/eV3PeSO9SUGFA3tOIqNRew and now I'm working on the deepspeech counterpart update to use those: https://github.com/mozilla/DeepSpeech/pull/1498
There are some errors, because of infra bits.
I've started working on that today, I've got our script worker instance fixed and refactored against newer version of scriptworker and asyncio. Now, I'm able to augment the DeepSpeecs Packages task with the list of C++ tarball, and properly rename them to avoid name collision. I've also got some PyGithub-backed code to be able to create / use existing release matching tag and perform actual upload.
It needs a bit more testing, and proper github token.
This is done, as you can see with that last test: https://github.com/mozilla/DeepSpeech/releases/tag/v0.2.0-alpha.10
We're working on final 0.2.0 so we need to not steal build cpu, but I'll take care of re-pushing 0.1.1 assets to the 0.1.1 release once the dust settles, using the very same infra !
And this is done for v0.1.1 as well now: https://github.com/mozilla/DeepSpeech/releases/tag/v0.1.1
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.