Closed
Description
What version of ripgrep are you using?
$ rg --version
ripgrep 11.0.1 (rev 1f1cd9b467)
-SIMD -AVX (compiled)
+SIMD -AVX (runtime)
How did you install ripgrep?
What operating system are you using ripgrep on?
Ubuntu 16.04.1
Describe your question, feature request, or bug.
Behavior of \b
word anchor applied on non-word characters is different than what is observed on other regex engines. I found this accidentally, wasn't a usecase I needed. But filing the issue anyway, incase this may be a bug.
If this is a bug, what are the steps to reproduce the behavior?
See example below.
If this is a bug, what is the actual behavior?
$ echo 'I have 12, he has 2!' | rg -o '\b..\b'
I
12
If this is a bug, what is the expected behavior?
I feel ripgrep
should also behave as seen in other regex engines.
$ # GNU grep 3.3
$ # grep -oP '\b..\b' and rg -oP '\b..\b' also produce same result
$ echo 'I have 12, he has 2!' | grep -o '\b..\b'
I
12
,
he
2
$ echo 'I have 12, he has 2!' | perl -lne 'print join "|", /\b..\b/g'
I |12|, |he| 2
$ echo 'I have 12, he has 2!' | ruby -ne 'puts $_.scan(/\b..\b/).join("|")'
I |12|, |he| 2
$ # python3.7
>>> import re
>>> re.findall(r'\b..\b', 'I have 12, he has 2!')
['I ', '12', ', ', 'he', ' 2']
The below image might help in understanding what is happening in all these regex engines. Vertical bar represents word boundary. Note that even though !
is at end of line, it doesn't have boundary after as it is not a word character.
Note
grep -o '\<..\>'
(also invim
) will be different compared to\b
as\b
cannot differentiate between start and end word boundarygrep -ow '..'
is same asrg -ow '..'
(perhaps because both do(?<!\w)pattern(?!\w)
for-w
option)