fixed witdth flag ! added

2015-10-08 15:19:46 +02:00
parent 38c4f5bcb6
commit 84fc6aabc8
4 changed files with 189 additions and 79 deletions
--- a/doc/formats.html
+++ b/doc/formats.html
@ -28,8 +28,8 @@ A format converter consists of
 </p>
 <ul>
 <li>The <code>%</code> character</li>
- <li>Optionally a field <span class="new">or record</span> name in <code>()</code></li>
- <li>Optionally flags out of the characters <code>*# +0-<span class="new">?=</span></code></li>
+ <li>Optionally a field or record name in <code>()</code></li>
+ <li>Optionally flags out of the characters <code>*# +0-?=!</code></li>
 <li>Optionally an integer <em>width</em> field</li>
 <li>Optionally a period character (<code>.</code>) followed
     by an integer <em>precision</em> field (input ony for most formats)</li>
@ -40,7 +40,7 @@ A format converter consists of
 <p>
 The flags <code>*# +0-</code> work like in the C functions
 <em>printf()</em> and <em>scanf()</em>.
-The flags <code>?</code> and <code>=</code> are extensions.
+The flags <code>?</code>, <code>=</code> and <code>!</code> are extensions.
 </p>
 <p>
 The <code>*</code> flag skips data in input formats.
@ -67,16 +67,24 @@ The <code>0</code> flag says that numbers should be left padded with
 The <code>-</code> flag specifies that output is left justified if
 <em>width</em> is larger than required.
 </p>
-<p class="new">
+<p>
 The <code>?</code> flag makes failing input conversions succeed with
 a default zero value (0, 0.0, or "", depending on the format type).
 </p>
-<p class="new">
+<p>
 The <code>=</code> flag allows to compare input with current values.
 It is only allowed in input formats.
 Instead of reading a new value from input, the current value is
 formatted (like for output) and then compared to the input.
 </p>
+<p>
+The <code>!</code> flag demands that input is exactly <em>width</em>
+bytes long (normally <em>width</em> defines the maximum number of
+bytes read in many formats).
+For example <code>in "%!5d";</code> expects exactly 5 digits.
+Fewer digits are considered loss of data and make the format fail.
+This feature has been added by Klemen Vodopivec, SNS.
+</p>

 <h3>Examples:</h3>
 <table>
@ -143,11 +151,8 @@ field formatted as a string.
 Use <code>in&nbsp;"%(<i>otherrecord</i>.RVAL)f";</code> to write the floating
 point input value into the <code>RVAL</code> field of
 <code><i>otherrecord</i></code>. 
-<span class="new">
 If no field is given for an other record .VAL is assumed.
 When a record name conflicts with a field name use .VAL explicitly.
-</span>
-
 </p>
 <p>
 This feature is very useful when one line of input contains many values that should
@ -158,7 +163,7 @@ attribute (see
 target="ex">Record Reference Manual</a>), the record will be processed.
 It is your responsibility that the data type of the record field is
 compatible to the the data type of the converter.
-<span class="new">STRING formats are compatible with arrays of CHAR or UCHAR.<span>
+STRING formats are compatible with arrays of CHAR or UCHAR.
 </p>
 <p>
 Note that using this syntax is by far not as efficient as using the
@ -192,11 +197,11 @@ With the <code>#</code> flag, output always contains a period character.
 <p>
 <b>Input:</b> All these formats are equivalent. Leading whitespaces are skipped.
 </p>
-<p class="new">
+<p>
 With the <code>#</code> flag additional whitespace between sign and number
 is accepted.
 </p>
-<p class="new">
+<p>
 When a maximum field width is given, leading whitespace only counts to the
 field witdth when the space flag is used.
 </p>
@ -213,7 +218,7 @@ field witdth when the space flag is used.
 <p>
 With the <code>#</code> flag, octal values are prefixed with <code>0</code>
 and hexadecimal values with <code>0x</code> or <code>0X</code>.
-<p class="new">
+<p>
 Unlike printf, <code>%x</code> and <code>%X</code> truncate the
 output to the the given width (number of least significant half bytes).
 </p>
@ -228,14 +233,14 @@ Octal and hexadecimal values can optionally be prefixed.
 hexadecimal notation.
 Leading whitespaces are skipped.
 </p>
-<p class="new">
+<p>
 With the <code>-</code> negative octal and hexadecimal values are accepted. 
 </p>
-<p class="new">
+<p>
 With the <code>#</code> flag additional whitespace between sign and number
 is accepted.
 </p>
-<p class="new">
+<p>
 When a maximum field width is given, leading whitespace only counts to the
 field witdth when the space flag is used.
 </p>
@ -253,13 +258,11 @@ and <code>%c</code> matches a sequence of not-null characters.
 The maximum string length is given by <em>width</em>.
 The default <em>width</em> is infinite for <code>%s</code> and
 1 for <code>%c</code>.
-Leading whitespaces are skipped with <code>%s</code> 
-<span class="new">
-except when the space flag is used</span>
-but not with <code>%c</code>.
+Leading whitespaces are skipped with <code>%s</code> except when
+the space flag is used but not with <code>%c</code>.
 The empty string matches.
 </p>
-<p class="new">
+<p>
 With the <code>#</code> flag <code>%s</code> matches a sequence of not-null
 characters instead of non-whitespace characters.
 </p>
@ -290,18 +293,18 @@ The strings are separated by <code>|</code>.
 Example: <code>%{OFF|STANDBY|ON}</code> mapps the string <code>OFF</code>
 to the value 0, <code>STANDBY</code> to 1 and <code>ON</code> to 2.
 </p>
-<p class="new">
+<p>
 When using the <code>#</code> flag it is allowed to assign integer values
 to the strings using <code>=</code>.
 Unassigned strings increment their values by 1 as usual.
 </p>
-<p class="new">
+<p>
 If one string is the initial substing of another, the substing must come
 later to ensure correct matching.
 In particular if one string is the emptry string, it must be the last one.
 Use <code>#</code> and <code>=</code> to renumber if necessary.
 </p>
-<p class="new">
+<p>
 Use the assignment <code>=?</code> for the last string to make it the
 default value for output formats.
 </p>
@ -310,12 +313,12 @@ Example: <code>%#{neg=-1|stop|pos|fast=10|rewind=-10}</code>.
 </p>
 <p>
 If one of the strings contains <code>|</code> or <code>}</code>
-<span class="new">(or <code>=</code>  if the <code>#</code> flag is used)</span>
+(or <code>=</code>  if the <code>#</code> flag is used)
 a <code>\</code> must be used to escape the character.
 </p>
 <p>
 <b>Output:</b> Depending on the value, one of the strings is printed,
-<span class="new">or the default if no value matches</span>.
+or the default if no value matches.
 </p>
 <p>
 <b>Input:</b> If any of the strings matches, the value is set accordingly.
@ -463,62 +466,158 @@ In input, the next byte or bytes must match the checksum.
 <h3>Implemented checksum functions</h3>
 <dl>
 <dt><code>%&lt;sum&gt;</code> or <code>%&lt;sum8&gt;</code></dt>
-  <dd>One byte. The sum of all characters modulo 2<sup>8</sup>.</dd>
+  <dd>
+   The sum of all characters modulo 2<sup>8</sup>.
+   <br>
+   One byte. <code>123456789%&lt;sum&gt;</code> = 0xdd
+  </dd>
 <dt><code>%&lt;sum16&gt;</code></dt>
-  <dd>Two bytes. The sum of all characters modulo 2<sup>16</sup>.</dd>
+  <dd>
+   The sum of all characters modulo 2<sup>16</sup>.
+   <br>
+   Two bytes. <code>123456789%&lt;sum16&gt;</code> = 0x01dd
+  </dd>
 <dt><code>%&lt;sum32&gt;</code></dt>
-  <dd>Four bytes. The sum of all characters modulo 2<sup>32</sup>.</dd>
+  <dd>
+   The sum of all characters modulo 2<sup>32</sup>.
+   <br>
+   Four bytes. <code>123456789%&lt;sum32&gt;</code> = 0x000001dd
+  </dd>
 <dt><code>%&lt;negsum&gt;</code>, <code>%&lt;nsum&gt;</code>, <code>%&lt;-sum&gt;</code>, <code>%&lt;negsum8&gt;</code>, <code>%&lt;nsum8&gt;</code>, or <code>%&lt;-sum8&gt;</code></dt>
-  <dd>One byte. The negative of the sum of all characters modulo 2<sup>8</sup>.</dd>
+  <dd>
+   The negative of the sum of all characters modulo 2<sup>8</sup>.
+   <br>
+   One byte. <code>123456789%&lt;-sum8&gt;</code> = 0x23
+  </dd>
 <dt><code>%&lt;negsum16&gt;</code>, <code>%&lt;nsum16&gt;</code>, or <code>%&lt;-sum16&gt;</code></dt>
-  <dd>Two bytes. The negative of the sum of all characters modulo 2<sup>16</sup>.</dd>
+  <dd>
+   The negative of the sum of all characters modulo 2<sup>16</sup>.
+   <br>
+   Two bytes. <code>123456789%&lt;-sum16&gt;</code> = 0xfe23
+  </dd>
 <dt><code>%&lt;negsum32&gt;</code>, <code>%&lt;nsum32&gt;</code>, or <code>%&lt;-sum32&gt;</code></dt>
-  <dd>Four bytes. The negative of the sum of all characters modulo 2<sup>32</sup>.</dd>
+  <dd>
+   The negative of the sum of all characters modulo 2<sup>32</sup>.
+   <br>
+   Four bytes. <code>123456789%&lt;-sum32&gt;</code> = 0xfffffe23
+  </dd>
 <dt><code>%&lt;notsum&gt;</code> or <code>%&lt;~sum&gt;</code></dt>
-  <dd>One byte. The bitwise inverse of the sum of all characters modulo 2<sup>8</sup>.</dd>
- <dt><code>%&lt;xor&gt;</code></dt>
-  <dd>One byte. All characters xor'ed.</dd>
- <dt><code>%&lt;xor7&gt;</code></dt>
-  <dd>One byte. All characters xor'ed &amp; 0x7F.</dd>
- <dt><code>%&lt;crc8&gt;</code></dt>
-  <dd>One byte. An often used 8 bit crc checksum
-  (poly=0x07, init=0x00, xorout=0x00).</dd>
- <dt><code>%&lt;ccitt8&gt;</code></dt>
-  <dd>One byte. The CCITT standard 8 bit crc checksum
-  (poly=0x31, init=0x00, xorout=0x00).</dd>
- <dt><code>%&lt;crc16&gt;</code></dt>
-  <dd>Two bytes. An often used 16 bit crc checksum
-  (poly=0x8005, init=0x0000, xorout=0x0000).</dd>
- <dt><code>%&lt;crc16r&gt;</code></dt>
-  <dd>Two bytes. An often used reflected 16 bit crc checksum
-  (poly=0x8005, init=0x0000, xorout=0x0000).</dd>
- <dt><code>%&lt;ccitt16&gt;</code></dt>
-  <dd>Two bytes. The usual (but <a target="ex"
-   href="http://www.joegeluso.com/software/articles/ccitt.htm">wrong?</a>)
-   implementation of the CCITT standard 16 bit crc checksum
-   (poly=0x1021, init=0xFFFF, xorout=0x0000).</dd>
- <dt><code>%&lt;ccitt16a&gt;</code></dt>
-  <dd>Two bytes. The unusual (but <a target="ex"
-   href="http://www.joegeluso.com/software/articles/ccitt.htm">correct?</a>)
-   implementation of the CCITT standard 16 bit crc checksum with augment.
-   (poly=0x1021, init=0x1D0F, xorout=0x0000).</dd>
- <dt><code>%&lt;ccitt16x&gt;</code> or <code>%&lt;crc16c&gt;</code> or <code>%&lt;xmodem&gt;</code></dt>
-  <dd>Two bytes. The XMODEM checksum.
-   (poly=0x1021, init=0x0000, xorout=0x0000).</dd>
- <dt><code>%&lt;crc32&gt;</code></dt>
-  <dd>Four bytes. The standard 32 bit crc checksum.
-   (poly=0x04C11DB7, init=0xFFFFFFFF, xorout=0xFFFFFFFF).</dd>
- <dt><code>%&lt;crc32r&gt;</code></dt>
-  <dd>Four bytes. The standard reflected 32 bit crc checksum.
-   (poly=0x04C11DB7, init=0xFFFFFFFF, xorout=0xFFFFFFFF).</dd>
- <dt><code>%&lt;jamcrc&gt;</code></dt>
-  <dd>Four bytes. Another reflected 32 bit crc checksum.
-   (poly=0x04C11DB7, init=0xFFFFFFFF, xorout=0x00000000).</dd>
- <dt><code>%&lt;adler32&gt;</code></dt>
-  <dd>Four bytes. The Adler32 checksum according to <a target="ex"
-   href="http://www.ietf.org/rfc/rfc1950.txt">RFC 1950</a>.</dd>
+  <dd>
+   The bitwise inverse of the sum of all characters modulo 2<sup>8</sup>.
+   <br>
+   One byte. <code>123456789%&lt;~sum8&gt;</code> = 0x22
+  </dd>
+ <dt><code>%&lt;notsum16&gt;</code> or <code>%&lt;~sum16&gt;</code></dt>
+  <dd>
+   The bitwise inverse of the sum of all characters modulo 2<sup>16</sup>.
+   <br>
+   Two bytes. <code>123456789%&lt;~sum16&gt;</code> = 0xfe22
+  </dd>
+ <dt><code>%&lt;notsum32&gt;</code> or <code>%&lt;~sum32&gt;</code></dt>
+  <dd>
+   The bitwise inverse of the sum of all characters modulo 2<sup>32</sup>.
+   <br>
+   Four bytes. <code>123456789%&lt;~sum32&gt;</code> = 0xfffffe22
+  </dd>
 <dt><code>%&lt;hexsum8&gt;</code></dt>
-  <dd>One byte. The sum of all hex digits. (Other characters are ignored.)</dd>
+  <dd>
+   The sum of all hexadecimal digits. (Other characters are ignored.)
+   <br>
+   One byte. <code>123456789%&lt;hexsum8&gt;</code> = 0x2d
+  </dd>
+ <dt><code>%&lt;xor&gt;</code></dt>
+  <dd>
+   All characters xor'ed.
+   <br>
+   One byte. <code>123456789%&lt;xor&gt;</code> = 0x31
+  </dd>
+ <dt><code>%&lt;xor7&gt;</code></dt>
+  <dd>
+   All characters xor'ed modulo 2<sup>7</sup>.
+   <br>
+   One byte. <code>123456789%&lt;xor7&gt;</code> = 0x31
+  </dd>
+ <dt><code>%&lt;crc8&gt;</code></dt>
+  <dd>
+   An often used 8 bit crc checksum
+   (poly=0x07, init=0x00, xorout=0x00).
+   <br>
+   One byte. <code>123456789%&lt;crc8&gt;</code> = 0x31
+  </dd>
+ <dt><code>%&lt;ccitt8&gt;</code></dt>
+  <dd>
+   The CCITT standard 8 bit crc checksum
+   (poly=0x31, init=0x00, xorout=0x00).
+   <br>
+   One byte. <code>123456789%&lt;ccitt8&gt;</code> = 0xf4
+  </dd>
+ <dt><code>%&lt;crc16&gt;</code></dt>
+  <dd>
+   An often used 16 bit crc checksum
+   (poly=0x8005, init=0x0000, xorout=0x0000).
+   <br>
+   Two bytes. <code>123456789%&lt;crc16&gt;</code> = 0xfee8
+  </dd>
+ <dt><code>%&lt;crc16r&gt;</code></dt>
+  <dd>
+   An often used reflected 16 bit crc checksum
+   (poly=0x8005, init=0x0000, xorout=0x0000).
+   <br>
+   Two bytes. <code>123456789%&lt;crc16r&gt;</code> = 0xbb3d
+  </dd>
+ <dt><code>%&lt;ccitt16&gt;</code></dt>
+  <dd>
+   The usual (but <a target="ex"
+   href="http://srecord.sourceforge.net/crc16-ccitt.html">wrong?</a>)
+   implementation of the CCITT standard 16 bit crc checksum
+   (poly=0x1021, init=0xFFFF, xorout=0x0000).
+   <br>
+   Two bytes. <code>123456789%&lt;ccitt16&gt;</code> = 0x29b1
+  </dd>
+ <dt><code>%&lt;ccitt16a&gt;</code></dt>
+  <dd>
+   The unusual (but <a target="ex"
+   href="http://srecord.sourceforge.net/crc16-ccitt.html">correct?</a>)
+   implementation of the CCITT standard 16 bit crc checksum with augment.
+   (poly=0x1021, init=0x1D0F, xorout=0x0000).
+   <br>
+   Two bytes. <code>123456789%&lt;ccitt16a&gt;</code> = 0xe5cc
+  </dd>
+ <dt><code>%&lt;ccitt16x&gt;</code> or <code>%&lt;crc16c&gt;</code> or <code>%&lt;xmodem&gt;</code></dt>
+  <dd>
+   The XMODEM checksum.
+   (poly=0x1021, init=0x0000, xorout=0x0000).
+   <br>
+   Two bytes. <code>123456789%&lt;xmodem&gt;</code> = 0x31c3
+  </dd>
+ <dt><code>%&lt;crc32&gt;</code></dt>
+  <dd>
+   The standard 32 bit crc checksum.
+   (poly=0x04C11DB7, init=0xFFFFFFFF, xorout=0xFFFFFFFF).
+   <br>
+   Four bytes. <code>123456789%&lt;crc32&gt;</code> = 0xfc891918
+  </dd>
+ <dt><code>%&lt;crc32r&gt;</code></dt>
+  <dd>
+   The standard reflected 32 bit crc checksum.
+   (poly=0x04C11DB7, init=0xFFFFFFFF, xorout=0xFFFFFFFF).
+   <br>
+   Four bytes. <code>123456789%&lt;crc32r&gt;</code> = 0xcbf43926
+  </dd>
+ <dt><code>%&lt;jamcrc&gt;</code></dt>
+  <dd>
+   Another reflected 32 bit crc checksum.
+   (poly=0x04C11DB7, init=0xFFFFFFFF, xorout=0x00000000).
+   <br>
+   Four bytes. <code>123456789%&lt;jamcrc&gt;</code> = 0x340bc6d9
+  </dd>
+ <dt><code>%&lt;adler32&gt;</code></dt>
+  <dd>
+   The Adler32 checksum according to <a target="ex"
+   href="http://www.ietf.org/rfc/rfc1950.txt">RFC 1950</a>.
+   <br>
+   Four bytes. <code>123456789%&lt;adler32&gt;</code> = 0x091e01de
+  </dd>
 </dl>

 <a name="regex"></a>
@ -584,7 +683,6 @@ Format flags <code>+</code>, <code>-</code>, and space are supported in
 the usual way (always sign, left justified, space instead of + sign).
 Flags <code>#</code> and <code>0</code> are unsupported.
 </p>
-<div class="new">
 <a name="timestamp"></a>
 <h2>14. Timestamp DOUBLE converter (<code>%T(<em>timeformat</em>)</code>)</h2>
 <p>
@ -637,7 +735,6 @@ Because of the complexity of the problem, locales are not supported.
 Thus, only the English month names can be used (week day names are
 ignored anyway).
 </p>
-</div>
 <hr>
 <p align="right"><a href="processing.html">Next: Record Processing</a></p>
 <p><small>Dirk Zimoch, 2011</small></p>
--- a/src/StreamCore.cc
+++ b/src/StreamCore.cc
@ -1479,6 +1479,7 @@ scanValue(const StreamFormat& fmt, long& value)
        }
        else return -1;
    }
+    if (fmt.flags & fix_width_flag && consumed != fmt.width) return -1;
    if (consumed > inputLine.length()-consumedInput) return -1;
    debug("StreamCore::scanValue(%s) scanned %li\n",
        name(), value);
@ -1510,6 +1511,7 @@ scanValue(const StreamFormat& fmt, double& value)
        }
        else return -1;
    }
+    if (fmt.flags & fix_width_flag && (consumed != (fmt.width + fmt.prec + 1))) return -1;
    if (consumed > inputLine.length()-consumedInput) return -1;
    debug("StreamCore::scanValue(%s) scanned %#g\n",
        name(), value);
@ -1542,6 +1544,7 @@ scanValue(const StreamFormat& fmt, char* value, long maxlen)
        }
        else return -1;
    }
+    if (fmt.flags & fix_width_flag && consumed != fmt.width) return -1;
    if (consumed > inputLine.length()-consumedInput) return -1;
 #ifndef NO_TEMPORARY
    debug("StreamCore::scanValue(%s) scanned \"%s\"\n",
--- a/src/StreamFormat.h
+++ b/src/StreamFormat.h
@ -30,7 +30,8 @@ typedef enum {
    zero_flag    = 0x10,
    skip_flag    = 0x20,
    default_flag = 0x40,
-    compare_flag = 0x80
+    compare_flag = 0x80,
+    fix_width_flag = 0x100,
 } StreamFormatFlag;

 typedef enum {
@ -48,7 +49,7 @@ typedef struct StreamFormat
 {
    char conv;
    StreamFormatType type;
-    unsigned char flags;
+    unsigned short flags;
    short prec;
    unsigned short width;
    unsigned short infolen;
--- a/src/StreamFormatConverter.cc
+++ b/src/StreamFormatConverter.cc
@ -38,7 +38,7 @@ parseFormat(const char*& source, FormatType formatType, StreamFormat& streamForm
 {
 /*
    source := [flags] [width] ['.' prec] conv [extra]
-    flags := '-' | '+' | ' ' | '#' | '0' | '*' | '?' | '='
+    flags := '-' | '+' | ' ' | '#' | '0' | '*' | '?' | '=' | '!'
    width := integer
    prec :=  integer
    conv := character
@ -85,6 +85,15 @@ parseFormat(const char*& source, FormatType formatType, StreamFormat& streamForm
                }
                streamFormat.flags |= default_flag;
                break;
+            case '!':
+                if (formatType != ScanFormat)
+                {
+                    error("Use of fixed width modifier '!' "
+                          "only allowed in input formats\n");
+                    return false;
+                }
+                streamFormat.flags |= fix_width_flag;
+                break;
            case '=':
                if (formatType != ScanFormat)
                {