[PATCH] doc: detail inconsistencies in sed word boundary handling

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] doc: detail inconsistencies in sed word boundary handling

Pádraig Brady
* doc/autoconf.texi (Limitations of usual tools): Display a
table showing where the various syntaxes for word boundaries
are supported.
---
 doc/autoconf.texi | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/doc/autoconf.texi b/doc/autoconf.texi
index 4be1f70..2e4b7ba 100644
--- a/doc/autoconf.texi
+++ b/doc/autoconf.texi
@@ -19666,6 +19666,18 @@ $ @kbd{echo abc | busybox sed '/a\(b\)c/ s/a\(b\)c/\1/'}
 b
 @end example
 
+Portable scripts should be aware of the inconsistencies and
+options for handling word boundaries.
+
+@example
+                \<      \b      [[:<:]]
+Solaris 10      yes     no      no
+Solaris XPG4    yes     no      error
+NetBSD 5.1      no      no      yes
+FreeBSD 9.1     no      no      yes
+GNU             yes     yes     error
+busybox         yes     yes     error
+@end example
 
 @item @command{sed} (@samp{t})
 @c ---------------------------
--
2.5.5


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] doc: detail inconsistencies in sed word boundary handling

Jim Meyering
On Sun, Oct 30, 2016 at 12:01 PM, Pádraig Brady <[hidden email]> wrote:
> * doc/autoconf.texi (Limitations of usual tools): Display a
> table showing where the various syntaxes for word boundaries
> are supported.
...

> +Portable scripts should be aware of the inconsistencies and
> +options for handling word boundaries.
> +
> +@example
> +                \<      \b      [[:<:]]
> +Solaris 10      yes     no      no
> +Solaris XPG4    yes     no      error
> +NetBSD 5.1      no      no      yes
> +FreeBSD 9.1     no      no      yes
> +GNU             yes     yes     error
> +busybox         yes     yes     error
> +@end example

Nice. Good to know.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] doc: detail inconsistencies in sed word boundary handling

Eric Blake-3
In reply to this post by Pádraig Brady
On 10/30/2016 12:01 PM, Pádraig Brady wrote:

> * doc/autoconf.texi (Limitations of usual tools): Display a
> table showing where the various syntaxes for word boundaries
> are supported.
> ---
>  doc/autoconf.texi | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/doc/autoconf.texi b/doc/autoconf.texi
> index 4be1f70..2e4b7ba 100644
> --- a/doc/autoconf.texi
> +++ b/doc/autoconf.texi
> @@ -19666,6 +19666,18 @@ $ @kbd{echo abc | busybox sed '/a\(b\)c/ s/a\(b\)c/\1/'}
>  b
>  @end example
>  
> +Portable scripts should be aware of the inconsistencies and
> +options for handling word boundaries.
> +
> +@example
> +                \<      \b      [[:<:]]
> +Solaris 10      yes     no      no
> +Solaris XPG4    yes     no      error
> +NetBSD 5.1      no      no      yes
> +FreeBSD 9.1     no      no      yes
> +GNU             yes     yes     error
> +busybox         yes     yes     error
> +@end example
It might be nice to add Cygwin to the list, although I don't know if one
row is sufficient.  It bases its regex engine on BSD code but adds an
extension for \< and \>; but depending on whether a program uses the
libc regex or its own, you can get GNU behavior (that is, Cygwin grep
supports \< and \b but not [[:<:]] because it uses gnulib and bypasses
native regex; while a native application supports [[:<:]] and \< but not
\b because of the BSD heritage plus cygwin extension).

It may be worth pointing out that POSIX does not require ANY support for
word boundaries in regex.

ACK.

--
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


signature.asc (617 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] doc: detail inconsistencies in sed word boundary handling

Pádraig Brady
On 01/11/16 13:46, Eric Blake wrote:

> On 10/30/2016 12:01 PM, Pádraig Brady wrote:
>> * doc/autoconf.texi (Limitations of usual tools): Display a
>> table showing where the various syntaxes for word boundaries
>> are supported.
>> ---
>>  doc/autoconf.texi | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/doc/autoconf.texi b/doc/autoconf.texi
>> index 4be1f70..2e4b7ba 100644
>> --- a/doc/autoconf.texi
>> +++ b/doc/autoconf.texi
>> @@ -19666,6 +19666,18 @@ $ @kbd{echo abc | busybox sed '/a\(b\)c/ s/a\(b\)c/\1/'}
>>  b
>>  @end example
>>  
>> +Portable scripts should be aware of the inconsistencies and
>> +options for handling word boundaries.
>> +
>> +@example
>> +                \<      \b      [[:<:]]
>> +Solaris 10      yes     no      no
>> +Solaris XPG4    yes     no      error
>> +NetBSD 5.1      no      no      yes
>> +FreeBSD 9.1     no      no      yes
>> +GNU             yes     yes     error
>> +busybox         yes     yes     error
>> +@end example
>
> It might be nice to add Cygwin to the list, although I don't know if one
> row is sufficient.  It bases its regex engine on BSD code but adds an
> extension for \< and \>; but depending on whether a program uses the
> libc regex or its own, you can get GNU behavior (that is, Cygwin grep
> supports \< and \b but not [[:<:]] because it uses gnulib and bypasses
> native regex; while a native application supports [[:<:]] and \< but not
> \b because of the BSD heritage plus cygwin extension).
Interesting, re cygwin, though here we're considering sed,
which would fall under GNU I think?
I've updated the attached with the note about POSIX.

(I don't have commit access to this repo)

thanks,
Pádraig

sed-word-boundary.patch (1K) Download Attachment