Space dilemmas for writing metric symbols and thousands separators

The BIPM’s SI brochure states “The numerical value always precedes the unit and a space is always used to separate the unit from the number.” and says that the space, not a comma or a dot, shall be used for the thousands separator. There are several space characters in computing, but the brochure does not state what kind of space you should use when typing. Ideally, you would use a non-breaking space to ensure that no line break separates the numerical value from the unit. Users face the same issue when using a space for the thousands separator.

One member of UKMA Google Group commented that Microsoft Word has five different options for the space character:

  • ‘Normal’ space (spacebar)
  • Em space which is a double-width space
  • En space which is approximately 1.5 times wider than a normal space
  • ¼ Em space which appears to be about the same size as a normal space Ctrl+Shift+\  
  • the non-breaking space which can be inserted using the shortkey combination of Ctrl+Shift+space

The first four options are unsuitable for use with metric symbols as they are either too wide or are not non-breaking. The only practical solution is the last one. This is all specific to Microsoft Word. What is supported in Windows and other software?

The easiest, most accessible and quickest option is the standard space character, which can be found on all computers, tablets and mobiles. This is Unicode character U+0020 or 32. You can type it by pressing the space bar. However, it is not non-breaking and if your numerical value and symbol appear near the end of a line, there is a risk that a line break will split them, which would look odd.

The no-break space (Unicode character U+00A0 or 160) creates a non-breaking space of available width.

The figure space (Unicode U+2007 or 8199) creates a non-breaking space equal to one numerical digit in width. This is designed for use when writing columns of figures and is useful for separating the number and the units when quoting an SI value.

The narrow no-break space (Unicode U+202F or 8239) creates a non-breaking space that is about one third of a normal space. It is useful as a thousands separator when writing numbers.

While the non-breaking space characters guarantee that the numerical value and unit are not split by a line break, they is awkward to type. It is possible to produce it using the numeric keypad by typing the decimal four-digit Unicode value while holding down the Alt key, but laptop keyboards do not have a numeric keypad. Nor do tablets and mobile phones. The other option to use these spaces in Windows is to use the Character Map where you can select them then copy and paste them.

Users face a constant dilemma when using spaces before metric symbols and thousands separators. Do they just use a standard space, which is much quicker and easier than all the others, and risk undesirable line breaks in the wrong place, or do they type one of the non-breaking space characters, which are a lot more cumbersome and take a lot longer? The computer industry needs to find solutions to improve support for characters needed for metric symbols and thousands separators. They added support for euros on standard keyboards and they can do the same for non-breaking space characters.

On the thousands separator, the SI brochure says:

“Following the 9th CGPM (1948, Resolution 7) and the 22nd CGPM (2003, Resolution 10), for numbers with many digits the digits may be divided into groups of three by a space, in order to facilitate reading. Neither dots nor commas are inserted in the spaces between groups of three. However, when there are only four digits before or after the decimal marker, it is customary not to use a space to isolate a single digit. The practice of grouping digits in this way is a matter of choice; it is not always followed in certain specialized applications such as engineering drawings, financial statements and scripts to be read by a computer.”

In the section about formatting the value of a quantity, the SI brochure says:

“The numerical value always precedes the unit and a space is always used to separate the unit from the number. Thus the value of the quantity is the product of the number and the unit. The space between the number and the unit is regarded as a multiplication sign (just as a space between units implies multiplication). The only exceptions to this rule are for the unit symbols for degree, minute and second for plane angle, °, ′ and ″, respectively, for which no space is left between the numerical value and the unit symbol.

This rule means that the symbol °C for the degree Celsius is preceded by a space when one expresses values of Celsius temperature t.”


Sources and further reading:

5 thoughts on “Space dilemmas for writing metric symbols and thousands separators”

  1. It is not only the device nor the programme being used to type, it is how other media interprets non standard characters.
    We went through a lot of this a few years back, that is when I sent loads of characters through various media: – Linux, windows, various office apps in each, metric views and Facebook, amongst others, and settled for whatever ended up intact through all of them. A non-breaking space was not one of them, it gets converted to a standard break. Standard breaks often convert to %20 (percent twenty) in web addresses as spaces screw up a lot of processes, as do a number of other standard keys.
    LibreOffice spreadsheet does not recognise a non-breaking space, the very place one would like tabulation to work correctly. At that point it was a dead issue for me. I don’t care what windows screws up.

    In short, we have what we have and we are going to have to live with that. Standardisation of use of what we know works would help.

    Like

  2. There is no perfect solution. The 8th edition SI Brochure recommended a thin space in section 5 for the thousands separator; the word “thin” has been dropped in the current 9th edition. Even on my PC desktop with numeric keyboard the higher address Unicode symbols are not always interpreted correctly by a browser when posting on the web, so I now use the simple non-breaking space Alt0160. On my laptop, tablet and phone, I don’t have a good solution.

    Like

  3. I checked Annex 14 of the UNICODE standard (https://www.unicode.org/reports/tr14/). The introduction section (Section 3) is of particular interest to us. It lists three no-break spaces that are of interest to us:

    * NO-BREAK SPACE (U+00A0)
    * FIGURE SPACE (U+2007)
    * NARROW NO-BREAK SPACE (U+202F)

    The FIGURE SPACE is the only fixed-width space. It is designed for use when one is writing out columns of numbers that need to be aligned with each other and where each character has the same width. Otherwise, according to the standard both NO-BREAK SPACE may be expanded or compressed for formatting purposes. NARROW NO-BREAK SPACE is not subject to expansion or to compression. This suggests to me that one should use the NARROW NO-BREAK space as a delimiter between groups of three characters and the FIGURE SPACE as the separator between a value and its units.

    References:
    top level document: https://www.unicode.org/versions/Unicode15.0.0/
    Page 267: https://www.unicode.org/versions/Unicode15.0.0/ch06.pdf
    page 915: (https://www.unicode.org/versions/Unicode15.0.0/ch23.pdf
    Annex 14: Introduction (https://www.unicode.org/reports/tr14/tr14-49.html#Properties

    Like

  4. I agree generally with Ronny’s points but would like to add a few observations that suggest how we have arrived at this generally unsatisfactory situation. There seem to be two main problems:

    • Some points from the CGPM are not well-explained.
    • The technology for printing has not developed in a manner that CGPM might have hoped for.

    Quoting CGPM: “The space between the number and the unit is regarded as a multiplication sign (just as a space between units implies multiplication).” I don’t follow the logic behind this and I suspect others don’t either. In algebraic notation, concatenated numerals and symbols are regarded as multiplied, e.g. 4ac). CGPM acknowledges this as one option among others, including spaces between symbols, which I have rarely seen in practice. A unit symbol is of course not the same thing as an algebraic symbol, but one can draw parallels.

    Overall I see this as a feeble argument. In the past I worked in liaison with the drawing office, who had their own ideas of how to do things. My argument was that if there was no space between the quantity and the unit, then in certain cases letters could be confused with numerals. For example, depending on how distinct were the letter “l” and the figure “1” characters on the typeface, “20l” could be interpreted as either twenty litres or the number two hundred and one.

    Regarding the decimal point, it used to be common practice, in books, newspapers, etc., for this to be a mid-height dot, thus 3·142. CGPM 1948 specified a dot on the base line, thus 3.142. Why did they do this?

    I think a likely reason is because in those days, most printed material began on a typewriter, which did not have the means to produce a mid-height dot. Typwritten copy would then go to the printing house, where typsetters using linotype machines would set the print line by line, and also take care of decimal points and other characters that a typewriter could not produce.

    Now fast-forward 70-plus years. “Linotype? What’s that, Granddad?” Text is now produced on computer keyboards, and transmission to the printing house is done electronically. There is no mechanical intervention or retyping, and human intervention is minimal. The final page is assembled using pagemaker software which, among other things, sets the length of each line of text to fit the objective page width. The space character triggers a new line when the present line is nearly full. Hence we need a hard space when we wish to avoid a line break.

    It would not be difficult to modify keyboards to produce both a mid-height dot and a hard space using a single stroke for each. On a present keyboard, a mid-height dot can be produced by holding down the ALT key and keying 250 on the numerical keypad, if there is one. But the numerical keypad already has a dot character. Pressing this creates the same character as the full-stop key on the main keyboard. A modified keyboard driver could cause it to create a mid-height dot instead. Smaller laptop computers do not have a numerical keypad, but perhaps we could reassign that useless key top left to produce a mid-height dot, instead of the left quote (not appropriate to modern word processing) or the ¬ character, of interest only to APL programmers, who no doubt would be smart enough to find other ways to produce this character if necessary.

    So the process of producing these characters is as good as solved, if we could only persuade ISO to introduce revised keyboard specifications. They have been revised several times since the mechanical typewriter layout where they originated. Smart-phones? Again problem as good as solved. The Android “hold-and-slide” facility already provides means to create many extra special characters. To add a couple more would surely take little more than a minor software update?

    As some have pointed out, the hard space can be modified during transmission. Some media work with limited character sets, which do not include the hard space. Also, newspapers and other publishing organizations usually have a large volume of copy to produce in limited time. They cannot afford to put in a fiddly key combination every time a hard space is desired. Even the magazine of a professional technical organization, to which I belong, has reverted to comma thousands separators and no space between quantity and unit symbol. They have clearly decided that the occasions where letters and numerals could be confused seldom happen, whereas an unwanted line break is a far more frequent issue.

    Nowadays CGPM permits, as an alternative to the period, the comma as a decimal point, as is common in many countries. To me, this is a cop-out. We should have a single, internationally agreed, symbol. The present set-up can cause ambiguity. Thus 3,142 could be interpreted as either three point one four two or three thousand one hundred and forty-two. Also, in some branches of mathematics, the comma is used as a parameter separator, so using it also as a decimal point can cause confusion.

    I don’t think we should try to tell printing houses how to do their jobs, given that at present they do not have the “tools” to do the job the CGPM way. The ISO specifies keyboard standards. The only way I see out of this muddle is for CGPM and ISO to talk to each other a lot more. Perhaps then, in about 20 years’ time, we may see some change.

    Like

  5. THE DECIMAL MARKER

    When handwriting I always use the mid high dot ( 3·142 ).  When typing the dot is on the line.

    In UK Government reports I think it is on the line.

    I guess most countries in the world use a comma on the line. Does anyone know?

    I don’t think they’ll change!

    +=+=+

    Now the problem showing thousnds: – especially common when showing money values £ $ et. al

    three thouasnd quid – £3,000  Using official SI I think it should be with a space - £3 000.

    I don’t think they’ll change!

    Like

Leave a comment