I am not really sure if this is an issue or intended behaviour.
It took me forever to move to end of a 6Mb file.
Steps to reproduce:
$ seq 1 1000000 > test$ bat testG to move to end of the file.bat was going through the file sequentially.Apologies, if this is not the place to ask this question or this is a known bug/feature.
Thank you for reporting this. Yes, this is the right place to ask questions like this :+1:
Let's try to untangle this a little bit:
When you call bat test, bat runs a pager (presumably less) and pipes all its output (the whole file contents) to the pager. Moving to the end of the file is slow because bat takes some time to output the full file contents.
Next, we can do some benchmarks. I am going to use my hyperfine tool to perform the measurements.
First, let's compare cat and bat when both are printing to /dev/null. Note that this is not the benchmark we are looking for, because in the interactive use (with or without the pager), we need to print (part of) the output to the terminal. This can cost a significant amount of time, especially if ANSI escape sequences are involved. Also, bat will use a faster loop-through-mode if it detects a non-interactive terminal.
> hyperfine 'cat test' 'bat test'
Benchmark #1: cat test
Time (mean ± σ): 2.1 ms ± 0.9 ms [User: 1.2 ms, System: 1.7 ms]
Range (min … max): 1.2 ms … 6.6 ms
Warning: Command took less than 5 ms to complete. Results might be inaccurate.
Benchmark #2: bat test
Time (mean ± σ): 422.1 ms ± 6.8 ms [User: 207.3 ms, System: 214.4 ms]
Range (min … max): 415.7 ms … 439.5 ms
Summary
'cat test' ran
199.43x faster than 'bat test'
So even with bats loop-through mode, it is two orders of magnitude slower than cat. This is unfortunate, but not too surprising. We have never optimized for speed and bat is reading and printing files line-by-line instead of using a larger buffer. Still, this is definitely something we could work on.
Next, we can make bat even slower by enabling all components that would be printed if we were printing to an interactive terminal:
> hyperfine --warmup 3 'bat --style=full --decorations=always --color=always test'
Benchmark #1: bat --style=full --decorations=always --color=always test
Time (mean ± σ): 2.298 s ± 0.035 s [User: 2.069 s, System: 0.226 s]
Range (min … max): 2.257 s … 2.378 s
With everything enabled (decorations such as the line numbers, the grid and ANSI colors), bat is another order of magnitude slower and takes about 2 seconds to print the whole file.
However, this still doesn't quite explain why it takes around 8-10 seconds (on my machine) for the pager to scroll to the end of the output when just using bat test.
To simulate this behavior in the benchmarks, we can use the --show-output option of hyperfine which will print the whole output to the terminal (instead of piping to /dev/null). Using this option, the benchmark will now include the rendering time of the terminal emulator (which might or might not be comparable to what less needs to do). Let's compare both cat and bat with this option enabled:
> hyperfine --show-output 'cat test' 'bat --paging=never test'
[...]
Time (mean ± σ): 1.054 s ± 0.044 s [User: 0.9 ms, System: 376.8 ms]
Range (min … max): 0.984 s … 1.115 s
[...]
Time (mean ± σ): 9.254 s ± 0.105 s [User: 3.065 s, System: 1.594 s]
Range (min … max): 9.107 s … 9.423 s
We can see that both cat and bat are significantly slowed down when having to actually print the output in a terminal.
So if my interpretation is correct, most of the time (around 75%) is actually caused by the terminal emulator or pager that needs to interpret the output of bat (which includes the ANSI color sequences, for example). I don't think there is anything that we can do about this, except to disable decorations and colors (--decorations=never --color=never).
That being said, we also saw that performance is not bat's strength :smile:
I don't see this as a really big problem as I usually don't want to syntax-highlight files with 6 MB of contents, but it might still be fun to work on optimization here.
Thank you very much @sharkdp for the explanation :)
@sharkdp would it be possible to have a mode where syntax highlighting is only attempted for the visible region of the text, not for the whole file?
assuming that is doable, it would make the performance on large files a lot faster and that would let me view log files with bat and keep bat as an alias for less.
@gsar I don't think that this is possible.
bat only pipes its output to less. There is no two-way communication between bat and the pager which would be needed to access the current location in the file.<!-- .. --> in a XML file that would start somewhere before that part of the file).Is there a reason that there is no output buffering?
The --help section states that "-u" for unbuffered is ignored since everything is always unbuffered.
To test the speed without testing the terminal, I use the following function:
time bat --color always --decorations always test | tail
Without buffering, this takes 4.4 seconds on my laptop
With buffering, this takes 2 seconds.
diff --git a/src/controller.rs b/src/controller.rs
index ac39abb..51daa27 100644
--- a/src/controller.rs
+++ b/src/controller.rs
@@ -1,4 +1,4 @@
-use std::io::{self, Write};
+use std::io::{self, Write, BufWriter,};
use std::path::Path;
use crate::app::{Config, PagingMode};
@@ -36,7 +36,7 @@ impl<'b> Controller<'b> {
}
let mut output_type = OutputType::from_mode(paging_mode, self.config.pager)?;
- let writer = output_type.handle()?;
+ let mut writer = BufWriter::new(output_type.handle()?);
let mut no_errors: bool = true;
let stdin = io::stdin();
@@ -50,7 +50,7 @@ impl<'b> Controller<'b> {
Ok(mut reader) => {
let result = if self.config.loop_through {
let mut printer = SimplePrinter::new();
- self.print_file(reader, &mut printer, writer, *input_file)
+ self.print_file(reader, &mut printer, &mut writer, *input_file)
} else {
let mut printer = InteractivePrinter::new(
&self.config,
@@ -58,7 +58,7 @@ impl<'b> Controller<'b> {
*input_file,
&mut reader,
);
- self.print_file(reader, &mut printer, writer, *input_file)
+ self.print_file(reader, &mut printer, &mut writer, *input_file)
};
if let Err(error) = result {
Without the tail pipe, I can also test the pager speed by quickly pressing ">", then "q" to scroll to the end and then quit (since the scroll to the end takes so long here, the speed should be more ore less accurate)
The current version takes ~9 seconds.
The patched version takes ~3 seconds.
My proposal would be to implement "-u" command line option and use buffered output if not specified.
@georgmu That sounds interesting, thank you!
Would the buffering cause any observable effects? If not, would you be interested in opening a PR?
I will prepare a PR, but I am currently fighting the borrow checker to have writer either &mut Write or BufWriter
I will prepare a PR, but I am currently fighting the borrow checker to have
writereither&mut WriteorBufWriter
Great, thank you. Let us know if you need help (perhaps just open a PR with the failing version).
Sorry for the delay. I have managed to build it and then compared bat to cat.
cat has a special implementation which reads in bigger blocks, but checks with some special handling to only buffer if there is input available.
As an example: If I just open cat and start to type and press return, the line is printed (as expected). With my patch, this is not the case, so the behavior differs.
To make it better, the output should always be buffered, but only be flushed if there is no input pending.
I will create a PR so you can have a look at the changes
Draft PR is #596.
As an example: If I just open cat and start to type and press return, the line is printed (as expected). With my patch, this is not the case, so the behavior differs.
To make it better, the output should always be buffered, but only be flushed if there is no input pending.
Yes, that's definitely a behavior that should be preserved. Otherwise, we cannot do things like in this tail -f example: https://github.com/sharkdp/bat#tail--f
(I guess you did that already, but: make sure that you turn off paging if you perform experiments with this: bat --paging=never. Otherwise, the pager might buffer part of the output.)
@gsar I don't think that this is possible.
- For proper syntax highlighting, we need to parse the file from the very beginning. There is no way you can consistently highlight just a part of a file (think of a block comment like
<!-- .. -->in a XML file that would start somewhere before that part of the file).
You could just do like vim: it will only look a certain number of lines (maybe 500-1000, I don't know) before the view.
But, that might require having a pager built in bat for that to work.
Most helpful comment
Thank you for reporting this. Yes, this is the right place to ask questions like this :+1:
Let's try to untangle this a little bit:
When you call
bat test,batruns a pager (presumablyless) and pipes all its output (the whole file contents) to the pager. Moving to the end of the file is slow becausebattakes some time to output the full file contents.Next, we can do some benchmarks. I am going to use my hyperfine tool to perform the measurements.
First, let's compare
catandbatwhen both are printing to/dev/null. Note that this is not the benchmark we are looking for, because in the interactive use (with or without the pager), we need to print (part of) the output to the terminal. This can cost a significant amount of time, especially if ANSI escape sequences are involved. Also,batwill use a faster loop-through-mode if it detects a non-interactive terminal.So even with
bats loop-through mode, it is two orders of magnitude slower thancat. This is unfortunate, but not too surprising. We have never optimized for speed andbatis reading and printing files line-by-line instead of using a larger buffer. Still, this is definitely something we could work on.Next, we can make
bateven slower by enabling all components that would be printed if we were printing to an interactive terminal:With everything enabled (decorations such as the line numbers, the grid and ANSI colors),
batis another order of magnitude slower and takes about 2 seconds to print the whole file.However, this still doesn't quite explain why it takes around 8-10 seconds (on my machine) for the pager to scroll to the end of the output when just using
bat test.To simulate this behavior in the benchmarks, we can use the
--show-outputoption ofhyperfinewhich will print the whole output to the terminal (instead of piping to/dev/null). Using this option, the benchmark will now include the rendering time of the terminal emulator (which might or might not be comparable to whatlessneeds to do). Let's compare bothcatandbatwith this option enabled:We can see that both
catandbatare significantly slowed down when having to actually print the output in a terminal.So if my interpretation is correct, most of the time (around 75%) is actually caused by the terminal emulator or pager that needs to interpret the output of
bat(which includes the ANSI color sequences, for example). I don't think there is anything that we can do about this, except to disable decorations and colors (--decorations=never --color=never).That being said, we also saw that performance is not
bat's strength :smile:I don't see this as a really big problem as I usually don't want to syntax-highlight files with 6 MB of contents, but it might still be fun to work on optimization here.