change the meaning of pre for regsub slightly

This commit is contained in:
2016-06-15 14:52:45 +02:00
parent 06e212c66e
commit bc67317b0b
3 changed files with 119 additions and 100 deletions

View File

@ -367,7 +367,7 @@ endian</em>, i.e. least significant byte first.
With the <code>0</code> flag, the value is unsigned, otherwise signed. With the <code>0</code> flag, the value is unsigned, otherwise signed.
</p> </p>
<p> <p>
In output, the <em>prec</em> (or sizeof(long) whatever is less) least In output, the <em>precision</em> (or sizeof(long) whatever is less) least
significant bytes of the value are sign extended or zero extended significant bytes of the value are sign extended or zero extended
(depending on the <code>0</code> flag) to <em>width</em> bytes. (depending on the <code>0</code> flag) to <em>width</em> bytes.
</p> </p>
@ -434,7 +434,7 @@ The <em>width</em> field is the byte number from which to start
calculating the checksum. calculating the checksum.
Default is 0, i.e. the first byte of the input or output of the current Default is 0, i.e. the first byte of the input or output of the current
command. command.
The last byte is <em>prec</em> bytes before the checksum (default 0). The last byte is <em>precision</em> bytes before the checksum (default 0).
For example in <code>"abcdefg%&lt;xor&gt;"</code> the checksum is calculated For example in <code>"abcdefg%&lt;xor&gt;"</code> the checksum is calculated
from <code>abcdefg</code>, from <code>abcdefg</code>,
but in <code>"abcdefg%2.1&lt;xor&gt;"</code> only from <code>cdef</code>. but in <code>"abcdefg%2.1&lt;xor&gt;"</code> only from <code>cdef</code>.
@ -534,35 +534,38 @@ This input-only format matches <a target="ex"
href="http://www.pcre.org/" >Perl compatible regular expressions (PCRE)</a>. href="http://www.pcre.org/" >Perl compatible regular expressions (PCRE)</a>.
It is only available if a PCRE library is installed. It is only available if a PCRE library is installed.
</p> </p>
<div class="box">
<p> <p>
If PCRE is not available for your host or cross architecture, download If PCRE is not available for your host or cross architecture, download
the sourcecode from <a target="ex" href="http://www.pcre.org/">www.pcre.org</a> the sourcecode from <a target="ex" href="http://www.pcre.org/">www.pcre.org</a>
and try my EPICS compatible <a target="ex" and try my EPICS compatible <a target="ex"
href="http://epics.web.psi.ch/software/streamdevice/pcre/Makefile">Makefile</a> href="http://epics.web.psi.ch/software/streamdevice/pcre/Makefile">Makefile</a>
to compile it like a normal EPICS application. to compile it like a normal EPICS support module.
The Makefile is known to work with EPICS 3.14.8 and PCRE 7.2. The Makefile is known to work with EPICS 3.14.8 and PCRE 7.2.
In your RELEASE file define the variable <code>PCRE</code> so that In your RELEASE file define the variable <code>PCRE</code> so that
it points to the install location of PCRE. it points to the install location of PCRE.
</p> </p>
<p> <p>
If PCRE is already installed on your system, use the variables If PCRE is already installed on (some of) your systems, you may add
<code>PCRE_INCLUDE</code> and <code>PCRE_LIB</code> instead to provide architectures where PCRE can be found in standard include and library
the install directories of <code>pcre.h</code> and the library. locations to the variable <code>WITH_SYSTEM_PCRE</code>.
</p> If either the header file or the library are in a non-standard place,
<p> set in your RELEASE file the variables <code>PCRE_INCLUDE_<em>arch</em></code>
If you have PCRE installed in different locations for different (cross) and/or <code>PCRE_LIB_<em>arch</em></code> for the respective architectures
architectures, define the variables in RELEASE.Common.&lt;architecture&gt; to the correct directories or set
instead of the global RELEASE file. <code>PCRE_INCLUDE</code> and/or <code>PCRE_LIB</code>
in architecture specific RELEASE.Common.<em>arch</em> files.
</p> </p>
</div>
<p> <p>
If the regular expression is not anchored, i.e. does not start with If the regular expression is not anchored, i.e. does not start with
<code>^</code>, leading non-matching input is skipped. <code>^</code>, leading non-matching input is skipped.
A maximum of <em>width</em> bytes is matched, if specified. A maximum of <em>width</em> bytes is matched, if specified.
If <em>prec</em> is given, it specifies the sub-expression whose match If <em>precision</em> is given, it specifies the sub-expression whose match
is retuned. is retuned.
Otherwise the complete match is returned. Otherwise the complete match is returned.
In any case, the complete match is consumed from the input buffer. In any case, the complete match is consumed from the input buffer.
If the expression contains a <code>/</code> it must be escaped. If the expression contains a <code>/</code> it must be escaped like <code>\/</code>.
</p> </p>
<p> <p>
Example: <code>%.1/&lt;title&gt;(.*)&lt;\/title&gt;/</code> returns Example: <code>%.1/&lt;title&gt;(.*)&lt;\/title&gt;/</code> returns
@ -579,48 +582,63 @@ it can be used as a pre-processor for input or
as a post-processor for output. as a post-processor for output.
</p> </p>
<p> <p>
Any match of the <em>regex</em> is replaced by the string <em>subst</em> with any Matches of the <em>regex</em> are replaced by the string <em>subst</em> with all
<code>&</code> or <code>\0</code> in <em>subst</em> replaced with the match itself and any <code>&</code> or <code>\0</code> in <em>subst</em> replaced with the match itself and all
<code>\1</code> through <code>\9</code> with the corresponding sub-expressions. <code>\1</code> through <code>\9</code> replaced with the match of the corresponding sub-expression.
To get a literal <code>&</code> or <code>\</code> in the substitution write To get a literal <code>&</code> or <code>\</code> or <code>/</code> in the substitution write
<code>\&</code> or <code>\\</code>. <code>\&</code> or <code>\\</code> or <code>\/</code>.
There is no way to specify literal bytes with values less or equal to 9 in the
substitution!
</p> </p>
<p> <p>
If <em>width</em> is specified, it limits the number of characters processed. If <em>width</em> is specified, it limits the number of characters processed.
If the <code>-</code> flag is used (i.e. <em>width</em> looks like a negative number) If the <code>-</code> flag is used (i.e. <em>width</em> looks like a negative number)
only the last <em>width</em> caracters are processed, else the first. only the last <em>width</em> characters are processed, else the first.
Without <em>width</em> all available characters are processed. Without <em>width</em> (or 0) all available characters are processed.
</p> </p>
<p> <p>
If <em>prec</em> is specified, it limits the number of times the substitution is applied. If <em>precision</em> is specified, it indicates which matches to replace.
Without <em>prec</em>, the substitution is applied as often as possible. With the <code>+</code> flag given, <em>precision</em> is the maximum
number of matches to replace.
Otherwise <em>precision</em> is the index (counting from 1) of the match to replace.
Without <em>precision</em> (or 0), all matches are replaced.
</p> </p>
<p> <p>
In input this converter pre-processes data received from the device before In input this converter pre-processes data received from the device before
other converters after this one read it. following converters read it.
Converters before this one will see unmodified input. Converters preceding this one will read unmodified input.
Thus place this converter before those whose input should be pre-processed. Thus place this converter before those whose input should be pre-processed.
</p> </p>
<p> <p>
In output it post-processes data already formatted by other converters before this one In output it post-processes data already formatted by preceding converters
before sending it to the device. before sending it to the device.
Converters after this one will send their output unmodified. Converters following this one will send their output unmodified.
Thus place this converter after those whose output should be post-processed. Thus place this converter after those whose output should be post-processed.
</p> </p>
<p> <p>
Examples:<br> Examples:
<code>%#-10.2/ab/X/</code> replaces the string <code>ab</code> with <code>X</code> <div class="indent">
<code>%#+-10.2/ab/X/</code> replaces the string <code>ab</code> with <code>X</code>
maximal 2 times in the last 10 characters. maximal 2 times in the last 10 characters.
(<code>abcabcabcabc</code> becomes <code>abcXcXcabc</code>)<br> (<code>abcabcabcabc</code> becomes <code>abcXcXcabc</code>)
<code>%#/..\B/&:/</code> writes <code>:</code> after every second character </div>
<div class="indent">
<code>%#/\\/\//</code> replaces all <code>\</code> with <code>/</code>
(<code>\dir\file</code> becomes <code>/dir/file</code>)
</div>
<div class="indent">
<code>%#/..\B/&:/</code> inserts <code>:</code> after every second character
which is not at the end of a word. which is not at the end of a word.
(<code>0b19353134</code> becomes <code>0b:19:35:31:34</code>)<br> (<code>0b19353134</code> becomes <code>0b:19:35:31:34</code>)
<code>%#/://</code> removes all <code>:</code>. </div>
(<code>0b:19:35:31:34</code> becomes <code>0b19353134</code>)<br> <div class="indent">
<code>%#/://</code> removes all <code>:</code> characters.
(<code>0b:19:35:31:34</code> becomes <code>0b19353134</code>)
</div>
<div class="indent">
<code>%#/([^+-])*([+-])/\2\1/</code> moves a postfix sign to the front. <code>%#/([^+-])*([+-])/\2\1/</code> moves a postfix sign to the front.
(<code>1.23-</code> becomes <code>-1.23</code>)<br> (<code>1.23-</code> becomes <code>-1.23</code>)<br>
</div>
</p>
<a name="mantexp"></a> <a name="mantexp"></a>
<h2>15. MantissaExponent DOUBLE converter (<code>%m</code>)</h2> <h2>15. MantissaExponent DOUBLE converter (<code>%m</code>)</h2>
<p> <p>
@ -679,7 +697,7 @@ In output, the system function <em>strftime()</em> is used to format the time.
There may be differences in the implementation between operating systems. There may be differences in the implementation between operating systems.
</p> </p>
<p> <p>
In input, <em>StreamDevice</em> used its own implementation because many In input, <em>StreamDevice</em> uses its own implementation because many
systems are missing the <em>strptime()</em> function and additional formats systems are missing the <em>strptime()</em> function and additional formats
are supported. are supported.
</p> </p>

View File

@ -88,6 +88,16 @@ code {
text-align:left; text-align:left;
} }
.box {
margin-left:1ex;
margin-right:1ex;
margin-top:0.5ex;
padding: 0 1ex;
border: 1px solid black;
text-align:left;
background-color:#f0f0f0;
}
#navleft { #navleft {
position:fixed; position:fixed;
left:0; left:0;

View File

@ -23,7 +23,7 @@
#include "string.h" #include "string.h"
#include "pcre.h" #include "pcre.h"
// Perl regular expressions (PCRE) %/regexp/ // Perl regular expressions (PCRE) %/regexp/ and %#/regexp/subst/
/* Notes: /* Notes:
- Memory for compiled regexp is allocated in parse but never freed. - Memory for compiled regexp is allocated in parse but never freed.
@ -65,15 +65,22 @@ parse(const StreamFormat& fmt, StreamBuffer& info,
error("Missing closing '/' after %%/%s format conversion\n", pattern()); error("Missing closing '/' after %%/%s format conversion\n", pattern());
return false; return false;
} }
if (*source == esc) { if (*source == esc) { // handle escaped chars
source++; if (*++source != '/') // just un-escape /
pattern.append('\\'); {
continue; pattern.append('\\');
if ((*source & 0x7f) < 0x30) // handle control chars
{
pattern.print("x%02x", *source++);
continue;
}
// fall through for PCRE codes like \B
}
} }
pattern.append(*source++); pattern.append(*source++);
} }
source++; source++;
debug("regexp = \"%s\"\n", pattern()); debug("regexp = \"%s\"\n", pattern.expand()());
const char* errormsg; const char* errormsg;
int eoffset; int eoffset;
@ -89,22 +96,19 @@ parse(const StreamFormat& fmt, StreamBuffer& info,
if (fmt.flags & alt_flag) if (fmt.flags & alt_flag)
{ {
StreamBuffer subst; StreamBuffer subst;
debug("check for subst in \"%s\"\n", StreamBuffer(source).expand()());
while (*source != '/') while (*source != '/')
{ {
if (!*source) { if (!*source) {
error("Missing closing '/' after %%#/%s/%s format conversion\n", pattern(), subst()); error("Missing closing '/' after %%#/%s/%s format conversion\n", pattern(), subst());
return false; return false;
} }
if (*source == esc) { if (*source == esc)
source++; subst.append(*source++);
subst.append('\\');
if (*source <= 9) subst.append('0'+*source++);
continue;
}
subst.append(*source++); subst.append(*source++);
} }
source++; source++;
debug("subst = \"%s\"\n", subst()); debug("subst = \"%s\"\n", subst.expand()());
info.append(subst).append('\0'); info.append(subst).append('\0');
return pseudo_format; return pseudo_format;
} }
@ -131,7 +135,7 @@ scanString(const StreamFormat& fmt, const char* input,
debug("pcre_exec match \"%.*s\" result = %d\n", length, input, rc); debug("pcre_exec match \"%.*s\" result = %d\n", length, input, rc);
if ((subexpr && rc <= subexpr) || rc < 0) if ((subexpr && rc <= subexpr) || rc < 0)
{ {
/* error or no match or not enough sub-expressions */ // error or no match or not enough sub-expressions
return -1; return -1;
} }
if (fmt.flags & skip_flag) return ovector[subexpr*2+1]; if (fmt.flags & skip_flag) return ovector[subexpr*2+1];
@ -148,40 +152,41 @@ scanString(const StreamFormat& fmt, const char* input,
} }
memcpy(value, input + ovector[subexpr*2], l); memcpy(value, input + ovector[subexpr*2], l);
value[l] = '\0'; value[l] = '\0';
return ovector[1]; /* consume input until end of match */; return ovector[1]; // consume input until end of match
} }
static void regsubst(pcre* code, StreamBuffer& buffer, long start, long length, const char* subst, int max) static void regsubst(const StreamFormat& fmt, StreamBuffer& buffer, long start)
{ {
int rc, l, c, r, rl, n=0; const char* subst = fmt.info;
pcre* code = extract<pcre*>(subst);
long length;
int rc, l, c, r, rl, n;
int ovector[30]; int ovector[30];
StreamBuffer s; StreamBuffer s;
if (length == 0)
{ length = buffer.length() - start;
length = buffer.length() - start; if (fmt.width && fmt.width < length)
} length = fmt.width;
else if (length < 0) if (fmt.flags & sign_flag)
{
length = -length;
if (length > buffer.length() - start)
length = buffer.length() - start;
start = buffer.length() - length; start = buffer.length() - length;
}
else debug("regsubst buffer=\"%s\", start=%ld, length=%ld, subst = \"%s\"\n",
{ buffer.expand()(), start, length, subst);
if (length > buffer.length() - start)
length = buffer.length() - start; for (c = 0, n = 1; c < length; n++)
}
debug("regsubst buffer=\"%s\", start=%ld, length=%ld, subst = \"%s\", max = %d\n",
buffer.expand()(), start, length, subst, max);
for (c = 0; c < length; )
{ {
rc = pcre_exec(code, NULL, buffer(start+c), length-c, 0, 0, ovector, 30); rc = pcre_exec(code, NULL, buffer(start+c), length-c, 0, 0, ovector, 30);
debug("pcre_exec match \"%.*s\" result = %d\n", (int)length-c, buffer(start+c), rc); debug("pcre_exec match \"%.*s\" result = %d\n", (int)length-c, buffer(start+c), rc);
if (rc < 0) // no match
if (rc < 0 || (max && n++ == max)) return;
return; /* no match or maximum substitutions reached */
/* replace & by match in subst */ if (!(fmt.flags & sign_flag) && n < fmt.prec) // without + flag
{
// do not yet replace this match
c += ovector[1];
continue;
}
// replace & by match in subst
l = ovector[1] - ovector[0]; l = ovector[1] - ovector[0];
debug("start = \"%s\"\n", buffer(start+c)); debug("start = \"%s\"\n", buffer(start+c));
debug("match = \"%.*s\"\n", l, buffer(start+c+ovector[0])); debug("match = \"%.*s\"\n", l, buffer(start+c+ovector[0]));
@ -192,22 +197,22 @@ static void regsubst(pcre* code, StreamBuffer& buffer, long start, long length,
debug("subs = \"%s\"\n", s.expand()()); debug("subs = \"%s\"\n", s.expand()());
for (r = 0; r < s.length(); r++) for (r = 0; r < s.length(); r++)
{ {
debug("check \"%s\"\n", s(r)); debug("check \"%s\"\n", s.expand(r)());
if (s[r] == '\\') if (s[r] == esc)
{ {
unsigned char ch = s[r+1]; unsigned char ch = s[r+1];
if (ch >= '0' && ch <= '9') if (ch < 9) // escaped 0 - 9 : replace with subexpr
{ {
ch = (ch - '0')*2; ch *= 2;
rl = ovector[ch+1] - ovector[ch]; rl = ovector[ch+1] - ovector[ch];
debug("replace \\%d: \"%.*s\"\n", ch/2, rl, buffer(start+c+ovector[ch])); debug("replace \\%d: \"%.*s\"\n", ch/2, rl, buffer(start+c+ovector[ch]));
s.replace(r, 2, buffer(start+c+ovector[ch]), rl); s.replace(r, 2, buffer(start+c+ovector[ch]), rl);
r += rl - 1; r += rl - 1;
} }
else if (ch == '\\' || ch == '&') else
s.remove(r, 1); s.remove(r, 1); // just remove escape
} }
else if (s[r] == '&') else if (s[r] == '&') // unescaped & : replace with match
{ {
debug("replace &: \"%.*s\"\n", l, buffer(start+c+ovector[0])); debug("replace &: \"%.*s\"\n", l, buffer(start+c+ovector[0]));
s.replace(r, 1, buffer(start+c+ovector[0]), l); s.replace(r, 1, buffer(start+c+ovector[0]), l);
@ -219,6 +224,8 @@ static void regsubst(pcre* code, StreamBuffer& buffer, long start, long length,
buffer.replace(start+c+ovector[0], l, s); buffer.replace(start+c+ovector[0], l, s);
length += s.length() - l; length += s.length() - l;
c += s.length(); c += s.length();
if (n == fmt.prec) // max match reached
return;
} }
} }
@ -226,15 +233,7 @@ int RegexpConverter::
scanPseudo(const StreamFormat& fmt, StreamBuffer& input, long& cursor) scanPseudo(const StreamFormat& fmt, StreamBuffer& input, long& cursor)
{ {
/* re-write input buffer */ /* re-write input buffer */
const char* info = fmt.info; regsubst(fmt, input, cursor);
pcre* code;
long length;
StreamBuffer subst;
code = extract<pcre*>(info);
if (fmt.flags & left_flag) length = -fmt.width;
else length = fmt.width;
regsubst(code, input, cursor, length, info, fmt.prec);
return 0; return 0;
} }
@ -242,15 +241,7 @@ bool RegexpConverter::
printPseudo(const StreamFormat& fmt, StreamBuffer& output) printPseudo(const StreamFormat& fmt, StreamBuffer& output)
{ {
/* re-write output buffer */ /* re-write output buffer */
const char* info = fmt.info; regsubst(fmt, output, 0);
pcre* code;
long length;
StreamBuffer subst;
code = extract<pcre*>(info);
if (fmt.flags & left_flag) length = -fmt.width;
else length = fmt.width;
regsubst(code, output, 0, length, info, fmt.prec);
return true; return true;
} }