Using ldsXML in Adobe InDesign

Whitespace handling standards

This section sets forth the Publishing Services Department standard for handling common whitespace characters in Adobe InDesign.

For the purposes of this document, “protected” whitespace refers to a whitespace character that has been tagged with valid ldsXML markup, or that is represented in markup with an xml element. “Unprotected” whitespace is any whitespace character entered into the text flow by an InDesign user that is not tagged.

The first two columns of the table below give the common short and long names of the whitespace character. The character’s Unicode value is given in the third column. The fourth column notes a corresponding tag to represent the character in ldsXML.

The fifth column describes how unprotected whitespace characters can be used in InDesign. Unprotected whitespace characters can be transformed into the xml or entered by an editor or production artist as needed, but, with the exception of hard hyphen, they are not a permanent part of the XML single source document. The sixth column indicates what happens to unprotected whitespace characters when XML is exported from InDesign.

Editors and artists using ldsXML in InDesign must be aware of what happens when they use unprotected whitepspace characters, or they may corrupt the single-source xml document. For example, an unprotected em space must not be used to separate words within a paragraph, because em spaces are stripped on export from InDesign, leaving the words that were adjacent to the em space to run together.

If one of these characters is needed in ldsXML permanently (that is, to become part of the single source document), with the exception of the hard hyphen, represent the character in the xml document using the element, or tagged, form listed in the fourth column.

Short name Long name(s) Unicode value (in hex) ldsXML tag (“Protected” representation of character) Character usage On export from InDesign, the unprotected character is …
br line break; line separator; soft return &#x2028; <br/>

Used to split up long titles. Typically entered by production artists rather than editors or translators.

Might be used by editors or production artists to force words within paragraphs to the next line (discouraged).

Might be used in place of a column break or to make room for a caption (discouraged).

Note: Since this character is stripped on export, it must not be used in place of a space. Keep the space, and insert the soft return after it. The soft return should, however, follow immediately after dashes or hyphens, with no space.

deleted
hr paragraph break, paragraph separator, hard return &#x2029; <hr/>

Might be used in place of a column break or to make room for a caption (discouraged).

Note: The protected version is used to create breaks within paragraph-type elements within imageText.

deleted
nb nonbreaking space, hard space &#x00A0; <nb/>

Used to force a word to the next line.

Note: The protected version is used to bind text that need to be together in print, on the Web, and in all other media; for example, “p.<nb/>3,” “v.<nb/>22,” or “Thomas<nb/>S. Monson.”

converted to single space character
tab tab &#x0009; <tab/> Used to separate bullets or other labels from list items, to line up decimals, and so forth. deleted
em em space &#x2003; <em/>

Sometimes used after run-in bullets or other labels or to separate elements that are run together (for example titles run into paragraphs).

Note: If you need an em space between words inside a paragraph (such as on a form), use the protected version.

deleted
en en space &#x2002; <en/>

Can be used like an em dash or tab after labels or between elements that are run together.

Note: If you need an en space between words inside a paragraph (such as between the state and zip code in an address), use the protected version.

deleted
fig figure space &#x2007; <fig/>

Used in place of a decimal tab to line up number labels in lists.

Do not use for kerning or copy-fitting.

deleted
punc punctuation space &#x2008; <punc/>

Used as a decimal in place of period or comma.

Do not use for kerning or copy-fitting.

converted to single space character
thin thin space &#x2009; None.

Used for on-the-fly kerning.

Do not use in place of a space.

deleted
hair hair space &#x200A; None.

Used for on-the-fly kerning.

Do not use in place of a space.

deleted

three per em (1/3 of an em)

three-per-em space &#x2004; None.

Used for on-the-fly kerning.

Do not use in place of a space.

deleted

four per em (1/4 of an em)

four-per-em space &#x2005; None.

Used for on-the-fly kerning.

Do not use in place of a space.

deleted

six per em (1/6 of an em)

six-per-em space &#x2006; None.

Used for on-the-fly kerning.

Do not use in place of a space.

deleted
shy soft hyphen &#x00AD; <shy/> (scriptures only) Use to insert a discretionary hyphen in a word. deleted
hard hyphen nonbreaking hyphen, hard hyphen &#x2011; None.

Use to insert a hyphen that will not break even if the word falls at the end of the line.

Note: Do not use this the way a hard space might be used, to bring down a word. Use it only for words that should not break in print, on the web, or any other publishing media. For example: “KBYU-TV”

kept as is