Skip to main content
Version: 3.x.x 🚧

@yozora/tokenizer-emphasis

Npm VersionNpm DownloadNpm LicenseModule formats: cjs, esmNode.js VersionTested with JestCode Style: prettier

github flavor markdown spec

First, some definitions. A delimiter run is either a sequence of one or more * characters that is not preceded or followed by a non-backslash-escaped * character, or a sequence of one or more _ characters that is not preceded or followed by a non-backslash-escaped _ character.

A left-flanking delimiter run is a delimiter run that is:

  1. not followed by Unicode whitespace, and either

a) not followed by a punctuation character, or

b) followed by a punctuation character and preceded by Unicode whitespace or a punctuation character.

For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

A right-flanking delimiter run is a delimiter run that is:

  1. not preceded by Unicode whitespace, and either

a) not preceded by a punctuation character, or

2b) preceded by a punctuation character and followed by Unicode whitespace or a punctuation character.

For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.


The following rules define emphasis and strong emphasis:

  1. A single * character can open emphasis iff (if and only if) it is part of a left-flanking delimiter run.

  2. A single _ character can open emphasis iff it is part of a left-flanking delimiter run and either

a) not part of a right-flanking delimiter run or

b) part of a [right-flanking delimiter run] preceded by a punctuation character.

  1. A single * character can close emphasis iff it is part of a right-flanking delimiter run.

  2. A single _ character can close emphasis iff it is part of a right-flanking delimiter run and either

a) not part of a left-flanking delimiter run or

b) part of a left-flanking delimiter run followed by a punctuation character.

  1. A double ** can open strong emphasis iff it is part of a left-flanking delimiter run.

  2. A double __ can open strong emphasis iff it is part of a left-flanking delimiter run and either

a) not part of a right-flanking delimiter run or

b) part of a right-flanking delimiter run preceded by a punctuation character.

  1. A double ** can close strong emphasis iff it is part of a right-flanking delimiter run.

  2. A double __ can close strong emphasis iff it is part of a right-flanking delimiter run and either

a) not part of a left-flanking delimiter run or

b) part of a [left-flanking delimiter run] followed by a punctuation character.

  1. Emphasis begins with a delimiter that can open emphasis and ends with a delimiter that can close emphasis, and that uses the same character (_ or *) as the opening delimiter. The opening and closing delimiters must belong to separate delimiter runs. If one of the delimiters can both open and close emphasis, then the sum of the lengths of the delimiter runs containing the opening and closing delimiters must not be a multiple of 33 unless both lengths are multiples of 33.

  2. Strong emphasis begins with a delimiter that can open strong emphasis and ends with a delimiter that can close strong emphasis, and that uses the same character (_ or *) as the opening delimiter. The opening and closing delimiters must belong to separate delimiter runs. If one of the delimiters can both open and close strong emphasis, then the sum of the lengths of the delimiter runs containing the opening and closing delimiters must not be a multiple of 33 unless both lengths are multiples of 33.

  3. A literal * character cannot occur at the beginning or end of *-delimited emphasis or **-delimited strong emphasis, unless it is backslash-escaped.

  4. A literal _ character cannot occur at the beginning or end of _-delimited emphasis or __-delimited strong emphasis, unless it is backslash-escaped.


Where rules 1-12 above are compatible with multiple parsings, the following principles resolve ambiguity:

  1. The number of nestings should be minimized. Thus, for example, an interpretation <strong>...</strong> is always preferred to <em><em>...</em></em>.

  2. An interpretation <em><strong>...</strong></em> is always preferred to <strong><em>...</em></strong>.

  3. When two potential emphasis or strong emphasis spans overlap, so that the second begins before the first ends and ends after the first ends, the first takes precedence. Thus, for example, *foo _bar* baz_ is parsed as <em>foo _bar</em> baz_ rather than *foo <em>bar* baz</em>.

  4. When there are two potential emphasis or strong emphasis spans with the same closing delimiter, the shorter one (the one that opens later) takes precedence. Thus, for example, **foo **bar baz** is parsed as **foo <strong>bar baz</strong> rather than <strong>foo **bar baz</strong>.

  5. Inline code spans, links, images, and HTML tags group more tightly than emphasis. So, when there is a choice between an interpretation that contains one of these elements and one that does not, the former always wins. Thus, for example, *[foo*](bar) is parsed as *<a href="bar">foo*</a> rather than as <em>[foo</em>](bar).

Install

npm install --save @yozora/tokenizer-emphasis

Usage

tip

@yozora/tokenizer-emphasis has been integrated into @yozora/parser / @yozora/parser-gfm-ex / @yozora/parser-gfm, so you can use YozoraParser / GfmExParser / GfmParser directly.

import YozoraParser from '@yozora/parser'

const parser = new YozoraParser()

// parse source markdown content
parser.parse(`

**foo bar**

__foo bar__

_foo bar_

*foo bar*

__**__foo__**__

`)

Options

NameTypeRequiredDefault
namestringfalse"@yozora/tokenizer-emphasis"
prioritynumberfalseTokenizerPriority.CONTAINING_INLINE
  • name: The unique name of the tokenizer, used to bind the token it generates, to determine the tokenizer that should be called in each life cycle of the token in the entire matching / parsing phase.

  • priority: Priority of the tokenizer, determine the order of processing, high priority priority execution. interruptable. In addition, in the match-block stage, a high-priority tokenizer can interrupt the matching process of a low-priority tokenizer.

    Exception: Delimiters of type full are always processed before other type delimiters.

Types

@yozora/tokenizer-emphasis produce Emphasis / Strong type nodes. See @yozora/ast for full base types.

  • Emphasis

    import type { Parent } from '@yozora/ast'

    export const EmphasisType = 'emphasis'
    export type EmphasisType = typeof EmphasisType

    /**
    * Emphasis represents stress emphasis of its contents.
    * @see https://github.com/syntax-tree/mdast#emphasis
    * @see https://github.github.com/gfm/#emphasis-and-strong-emphasis
    */
    export type Emphasis = Parent<EmphasisType>
  • Strong

    import type { Parent } from '@yozora/ast'

    export const StrongType = 'strong'
    export type StrongType = typeof StrongType

    /**
    * Strong represents strong importance, seriousness, or urgency for its
    * contents.
    * @see https://github.com/syntax-tree/mdast#strong
    * @see https://github.github.com/gfm/#emphasis-and-strong-emphasis
    */
    export type Strong = Parent<StrongType>

Live Examples

  • Rule1.

      
      
  • Rule2.

      
      
  • Rule3.

      
      
  • Rule4.

      
      
  • Rule5.

      
      
  • Rule6.

      
      
  • Rule7.

      
      
  • Rule8.

      
      
  • Rule9.

      
      
  • Rule10.

      
      
  • Rule11.

      
      
  • Rule12.

      
      
  • Rule13.

    #475
      
      
  • Rule14.

      
      
  • Rule15.

      
      
  • Rule16.

      
      
  • Rule17.