Handbrake: Enhancement : Multiple encodes at the same time via GUI

Created on 27 Jun 2018  ·  20Comments  ·  Source: HandBrake/HandBrake

Hello,

With the trend of many cores (Intel 79xx , Threadripper, Threadripper "2" up to 32 cores, etc), it's sometime hard to fully utilize new cpus like this, even with the "threads" argument.

Would it be possible to have to options the encode items in queue 2 by 2, or 4 by 4, or even more via the GUI ? I know we can "workaround" that with multiple handbrake instances, but it's not very "clean".

Thx : ) (and sorry for my english)

Moderator Note Added:
Implemented on:

  • [x] Windows
  • [x] macOS
  • [ ] Linux
Enhancement

Most helpful comment

Dear all,

As we have discussed in the HandBrake forum (https://forum.handbrake.fr/viewtopic.php?f=11&t=39767) we have managed to bring up a stable version of our suggested parallel implementation of HandBrake in CLI. We are finding the initial results of the parallel implementation to be promising. We are currently testing our implementation in AMD Threadripper 2990wx (with 64 threads) and our parallel versions is showing higher CPU and Memory utilization when compared to the sequential version.

Average Memory Utilization on AMD 2990WX (1)
Scaled Execution Time (1)
Average CPU Utilization on AMD 2990WX (1)

We attach our average execution time , CPU utilization and memory utilization for serial, dynamic parallel and static parallel modes. In static parallel mode the user can select the number of jobs to be run in parallel. In the charts we are providing the results for 3,4 and 5 jobs in parallel (marked as parallel 3, parallel 4 and parallel 5 respectively in the charts). In dynamic parallel (marked as just parallel in the charts), our online dynamic cost function framework decides on the number of jobs to be run in parallel.

The queues have been randomly selected to cover as much use cases as possible. We are on the process of extending this to the GUI.

We are still actively trying to improve our parallel logic, however if you could try our parallel implementation out and provide your valuable suggestions it would be great.
Please find attached our CLI version of parallel HandBrake here.
To run the sequential mode :

_HandBrakeCLI.exe --queue-import-file file.json_

[The execution time for this mode is similar to traditional HandBrake]
Dynamic parallel mode :

HandBrakeCLI.exe --queue-import-file file.json --parallel

Static parallel mode :

HandBrakeCLI.exe --queue-import-file file.json --parallel=n

(where n represents number of jobs to be run in parallel )

All 20 comments

This is something I would like to implement and the necessary infrastructure for it has been started. But there's quite a lot of work left to do. So it's not coming soon, but it is coming eventually.

It's nice to know it's coming eventually, thx a lot :)

@Rootax
I've been working on a GUI which does this. You can specify as many HandBrake instances as you like. You can also do folder watching and recursive folder conversion. Might be worth taking a look:
https://github.com/HaveAGitGat/HBBatchBeast

Dear all,

As we have discussed in the HandBrake forum (https://forum.handbrake.fr/viewtopic.php?f=11&t=39767) we have managed to bring up a stable version of our suggested parallel implementation of HandBrake in CLI. We are finding the initial results of the parallel implementation to be promising. We are currently testing our implementation in AMD Threadripper 2990wx (with 64 threads) and our parallel versions is showing higher CPU and Memory utilization when compared to the sequential version.

Average Memory Utilization on AMD 2990WX (1)
Scaled Execution Time (1)
Average CPU Utilization on AMD 2990WX (1)

We attach our average execution time , CPU utilization and memory utilization for serial, dynamic parallel and static parallel modes. In static parallel mode the user can select the number of jobs to be run in parallel. In the charts we are providing the results for 3,4 and 5 jobs in parallel (marked as parallel 3, parallel 4 and parallel 5 respectively in the charts). In dynamic parallel (marked as just parallel in the charts), our online dynamic cost function framework decides on the number of jobs to be run in parallel.

The queues have been randomly selected to cover as much use cases as possible. We are on the process of extending this to the GUI.

We are still actively trying to improve our parallel logic, however if you could try our parallel implementation out and provide your valuable suggestions it would be great.
Please find attached our CLI version of parallel HandBrake here.
To run the sequential mode :

_HandBrakeCLI.exe --queue-import-file file.json_

[The execution time for this mode is similar to traditional HandBrake]
Dynamic parallel mode :

HandBrakeCLI.exe --queue-import-file file.json --parallel

Static parallel mode :

HandBrakeCLI.exe --queue-import-file file.json --parallel=n

(where n represents number of jobs to be run in parallel )

Additional Query on GUI:
We observe that the way in which the GUI process individual jobs and the encoders are launched and synced with the GUI is different from the CLI. We have a fairly simple synchronization framework working for the CLI version which is safe.

  • To implement a parallel version of the GUI,do we need to use a common synchronization framework for both CLI and GUI?
    The major issue with this approach is that there are a few use-cases such as deleting a live/pending jobs which are present in GUI that would introduce additional complexities to the CLI synchronization framework.
  • What are the generic implementation practices for the CLI and GUI versions?
  • When do you suggest using a common interface for both version? Do you have any additional suggestions?

Please avoid touching the UI as I have a major change to implement process isolation under way. This means there is no possibility of using libhb to synchronise the queue.
I'm making this change for the following reasons:

  1. It prevents taking out the entire UI any time a job triggers a crash.
  2. The UI needs to be isolated from libhb to allow us to bring in native ARM support due to limitations in WPF currently.

After I've completed this work I will likely add the ability to the UI run multiple encodes at once but it's going to require some thought around the UX in a couple of places and a fair amount of re-architecturing behind the scenes.

With regards to running multiple jobs in one process space, I don't think this is a good idea. If any one job triggers a crash, it'll kill and corrupt all running jobs. I'd reject any patch that doesn't properly isolate running jobs.

We are looking to optimize the execution time in larger CPUs and we are planning to create a parallel version of the HandBrake. So it will be great if we can be part of the new implementation so that we can come up with the parallel version with better understanding.
So we started with exploring the methods to implement as you suggested and came up with some questions which would help us to proceed further.

  • Since isolation of jobs is the main focus, will it be a Process based execution?
  • If process based then how would you want the process to communicate with each other? IPC - (MPI)
  • How are you planning to handle exceptions created by errors?

Since isolation of jobs is the main focus, will it be a Process based execution?

Yes, each worker will be a separate process.

If process based then how would you want the process to communicate with each other? IPC - (MPI)

HTTP/JSON/REST API on worker processes. This is being implemented in .NET, not the core library.

How are you planning to handle exceptions created by errors?

No different to usual. If the process non-zero exit codes, we know there was a problem and the log will tell us more.

We are really not looking to do anything particularly complex here. It is going to be a simple preference to define X number of worker units and we'll call a day.

Thanks @sr55, we will look into the process-based approach as you suggested.

@Rootax, @jstebbins and @HaveAGitGat did you get a chance to try our Parallel HandBrake CLI in your machine. It would be great to know your results which will help us in understanding further.

Thanks @sr55, we will look into the process-based approach as you suggested.

As I have already said, I already have this work in progress. There is little point in you trying to do the same thing.

Thanks @sr55, we will wait for the implementation. However, we are more concentrated on CLI right now. If you do not have plans for the CLI version can we continue with your design on CLI?

Seems like the CLI would be relatively easy to parallelize. It already has an option to use a json queue file to process a queue of jobs. Not sure how you intend to initially create the queue file (externally somehow or with new CLI options)? The most difficult part is how to show progress in a clean manner.

But I would think the basics would be, for each queue item, extract the job, wrap that job in a single item queue dictionary, write that to a temp file, launch new instance of HandBrakeCLI with --queue-import-file, erase temp file.

How to represent the progress of multiple simultaneous jobs in a terminal app is challenging. But I would think the internal mechanism would require something like a pipe or socket connection back to the master that you send status and progress messages over (i.e. the master would be responsible for display of all progress and status). libhb already has a function for collecting status and progress in json format.

Using a rest API would be more work than IPC but you could add in functionality so that 3rd party apps/wrappers can ping it for a JSON response on queue/worker status rather than parsing the HandBrakeCLI output.

You’ve probably got most of it covered already but I’ll add that a problem I’ve faced with the 2 HandBrake parallelisation apps I’ve worked on (HBBatchBeast & Tdarr) is that sometimes (rarely) HandBrake will hang on an item indefinitely so it’s worth adding in a worker stall detector with a progress timeout of say 300 seconds to prevent the queue being held up.

And a ––queue-gen path CLI command or such with a recursive option would be handy.

Using a rest API would be more work than IPC but you could add in functionality so that 3rd party apps/wrappers can ping it for a JSON response on queue/worker status rather than parsing the HandBrakeCLI output.

Is it too much to ask for both :wink: A rest API that responds with status json should be pretty straight forward and would be very useful to script writers as you say.

As far as stall detection goes, if you use IPC and have the worker instances send status at a fixed interval, then any worker that you haven't heard from in say 2 * interval is stalled.

Saw similar scaling in running multiple instances myself(3950X). Is this an issue with device reads? Would having the source material on multiple devices help much here? Just curious as I'm surprised not to see closer to 100% utilization.

CPU utilization is dependent of a lot of things, but generally I/O is not an issue. Mostly it boils down to some algorithms not being easily parallelizable or the effort to parallelize simply hasn't been made. x265 more efficiently uses more threads than x264, libvpx and other encoders do much more poorly. Filters have varying degrees of paralellization (some are single threaded).

@tracker1 Did you test the application (link) that we shared? If possible kindly share us the scaling data that you have observed with shared application and multiple instance test.

@umadevimcw I hadn't, sorry.. it was a couple months ago (popos or ubuntu, latest rc kernel at the time)... I thought it was just slow feeding... noticed night before last, windows 10 (2004 insiders) that it is absolutely hitting 95-98% on a single instance... was happy to see that, didn't crunch any numbers, but was doing x265 ultrafast for a ~2hr 1080p video in roughly 12-13 minutes.

Hi @sr55,
Thanks for the clarifications. It would be great if you could clarify some more doubts

  • Do you already have a design in place for the CLI version?
  • Is it going to be OS generic? or Is it going to separate implementation for windows, linux, etc.
  • We would like to contribute for CLI version of handbrake if you are right now focused on building the GUI version. We are looking into using boost/Windows Native API to enable the inter process communication. Please share your thoughts on this.

Hi, together,

I'm hopefully not being off-topic here:

If this topic resulted nowadays in the "Process isolation" configuration possibility in the actual Nightlies (when built), then I have to post here, that at least the pre-enabled configiration of the "Process Isolation" fails all my encodings on my R9 3900X. Disabling it in my configuration, makes my Handbrake work normally again.

If I'm just a lone soul happening to have this problem, then just ignore me.
I just wanted to tell you guys, because after having trouble multiple times to get freshly built builds of the Nightlies do their work normally I did some experimenting today and finally i found the option that re-enables my built Handbrake to work properly again.

If I should open a new Issue for that, just tell me and I will follow order.

Kind regards, stay healthy and nice work from all guys and girls contributing here.
RokE

Was this page helpful?
0 / 5 - 0 ratings