I set setopt HIST_FIND_NO_DUPS and expect, that my Ctrl+R search history filtered without dups. I don't want to use option HIST_IGNORE_ALL_DUPS cause my history must be full and with dups, and only in search term I don't want to see duplicates.
Any idea how we can implement it?
I have replaced __fzf_history in .fzf-key-bindings.bash with the following :
__fzf_history__() {
local line
countskip="$(history | tail -n 1 | grep -E '^[0-9]+' -o | wc -c)"
countskip="$(( countskip + 1 ))"
line=$(
HISTTIMEFORMAT= history |
grep '.\{1,79\}' |
sed 's/ *$//g' |
tac |
nauniq --skip-chars="$countskip" |
tac |
$(__fzfcmd) +s --tac +m -n2..,.. --tiebreak=index --toggle-sort=ctrl-r |
\grep '^ *[0-9]') && sed 's/ *\([0-9]*\)\** .*/!\1/' <<< "$line"
}
You will need nauniq (which you can find here : https://metacpan.org/release/App-nauniq)
But I wouldn't put this in the main repository
@vaxXxa @junegunn @edi9999 Is this helpful http://unix.stackexchange.com/a/84838/44493?
The idea is to add numbers as the first column, sort & uniq by the second column (actual history lines), sort again by the first column (original indices) and remove indices. Only standard tools are used.
@balta2ar Thanks. But since the command history can be huge depending on the configuration I'm a bit concerned about the performance. Since fzf command for ctrl-r binding uses --tac option, the input stream should finish very quickly. Any delay will be noticeable and hurt the usability.
I also thought that using sort could make the command much slower since sort is O(n * log(n)):
However, with 25k lines of history, the command still takes less than 10ms :
history > h ; wc -l < h; time bash -c 'cat h | sort --key=2.1 -b -u | sort -n | cut -c8- > foobar '
25117
real 0m0.070s
user 0m0.067s
sys 0m0.008s
I have around 12K and it's rather fast:
history 0 > h ; wc -l < h; time bash -c 'cat h | sort --key=2.1 -b -u | sort -n | cut -c8- > foobar '
12368
bash -c 'cat h | sort --key=2.1 -b -u | sort -n | cut -c8- > foobar ' 0.02s user 0.01s system 97% cpu 0.027 total
@edi9999 @balta2ar Interesting results, thanks! With my 37K history, it takes 26ms to deduplicate using the approach. Interestingly, alternative solutions using awk or perl turned out to be slower although they can immediately start printing the lines before processing the entire list. However, this advantage does not hold in this case, as the users will most likely want the tail of the list.
export LC_ALL=C
HISTTIMEFORMAT= history > /tmp/h
time for _ in $(seq 10); do cat /tmp/h > /dev/null; done
time for _ in $(seq 10); do sort --key=2.1 -b -u /tmp/h | sort -nr > /dev/null; done
time for _ in $(seq 10); do awk '!_[ substr($0, index($0, $2)) ]++' /tmp/h > /dev/null; done
time for _ in $(seq 10); do perl -ne '$seen{s/^ *[0-9]* *//r}++ or print' /tmp/h > /dev/null; done
26ms for 37k means that it will take many more lines to reach 100ms which I believe is still reasonable. If we are concerned about the user seeing the incomplete input stream, we can consider applying --sync option to block the UI until the stream is complete.
I'm more inclined to make this behavior the default. You might have noticed that I'm not a big fan of adding more configuration knobs, I think it's our responsibility to find and present the good defaults than to lazily pass the buck to the users.
I'm more inclined to make this behavior the default.
Do you mean for all shells ? I think this is a good idea.
In fact, I just chimed in because I saw a solution for the sorting issue. My use case is not similar to the one of the issue starter. I use hist_ignore_all_dups and I'm ok with that, I don't need dups in my history, so I don't want fzf to waste time removing dups when there aren't any. Considering this, I'd vote against adding an extra delay, even such a small one.
Maybe it's better to sleep on it and wait if more users come with the same request? And for now, suggest a script in the Wiki section so that users can still customize the behavior?
@balta2ar Good point, we don't want to pay the cost if the list already without duplicates.
Anyway, I noticed an issue with the suggested approach. It prints the first occurrence of a duplicate command but what we want to see is the last occurrence of it. Using tac and enabling stable sort seems to solve the issue, but it adds extra overhead to the processing. On OS X, tac is not available by default. We can use tail -r instead but it's more expensive.
# 58ms / iteration
time for _ in $(seq 10); do tail -r /tmp/h | sort --key=2.1 -bus | sort -nr > /dev/null; done
# brew install coreutils
# 29ms / iteration
time for _ in $(seq 10); do gtac /tmp/h | sort --key=2.1 -bus | sort -nr > /dev/null; done
@balta2ar Have full history is really helpful, if you want to see step-by-step commands history. For example, you compiled and configured something and want to see again, how you did that.
@vaxXxa Good point, actually, that might useful. However, there has to be some substring in all of the commands that you want to find in the history. Otherwise, they fill be filtered out by fzf. But yes, I see your point.
This seems to be begging for a configuration option, given the opposing performance concerns.
This shouldn't even be an issue most users need to re-configure, as you can toggle the default setting upon setup based on the user's HIST_FIND_NO_DUPS, at least in zsh?
Thanks for the ideas here. I ended up doing this in my ~/.bashrc
This is on Fedora 26, using the 'fzf' package. Supposedly 'tac' is not available on Macs, so this won't work for them.
The only change is this line:
HISTTIMEFORMAT= history | tac | sort --key=2.1 -bus | sort -n |
I like having the the line numbers in the history, so I didn't use the "cut" command
FZF_KEYBINDINGS_FILE=/usr/share/fzf/shell/key-bindings.bash
if [[ -f ${FZF_KEYBINDINGS_FILE} ]]; then
source ${FZF_KEYBINDINGS_FILE}
__fzf_history__() (
local line
shopt -u nocaseglob nocasematch
line=$(
HISTTIMEFORMAT= history | tac | sort --key=2.1 -bus | sort -n |
FZF_DEFAULT_OPTS="--height ${FZF_TMUX_HEIGHT:-40%} $FZF_DEFAULT_OPTS --tac -n2..,.. --tiebreak=index --bind=ctrl-r:toggle-sort $FZF_CTRL_R_OPTS +m" $(__fzfcmd) |
command grep '^ *[0-9]') &&
if [[ $- =~ H ]]; then
sed 's/^ *\([0-9]*\)\** .*/!\1/' <<< "$line"
else
sed 's/^ *\([0-9]*\)\** *//' <<< "$line"
fi
)
fi
Perhaps as a compromise, fzf could have a $FZF_HISTORY_FILTER setting, defaulting to empty string, but advanced users could set it to | sort --key=2.1 -bus | sort -nr to get this behavior? As it is, we have to edit the fzf-provided files to remove duplicates.
Any news? If I'd like HIST_FIND_NO_DUPS respected, what's the current recommended way to achieve this?
@romkatv Your z4h-fzf-history function says it has duplicate removal. Can that feature be ported to fzf-history-widget?
Cool! Didn't know about this commit.
FWIW, z4h-fzf-history works for multi-line commands, too. E.g., if you run the following command and then hit Ctrl-R, you'll see that exact command in preview.
print -r 'hi
\n
bye'
The stock fzf-history-widget cannot distinguish between the command above and print -r 'hi\n\n\nbye'.
In addition, z4h-fzf-history doesn't depend on perl or any other external tool.
This is just FYI. Feel free to do as you please, I'm fine either way.
Most helpful comment
Any news? If I'd like HIST_FIND_NO_DUPS respected, what's the current recommended way to achieve this?