Skip to main content
Version: 2.x.x

@yozora/tokenizer-link

Npm VersionNpm DownloadNpm LicenseModule formats: cjs, esmNode.js VersionTested with JestCode Style: prettier

github flavor markdown spec

A link contains link text (the visible text), a link destination (the URI that is the link destination), and optionally a link title. There are two basic kinds of links in Markdown. In inline links the destination and title are given immediately after the link text. In reference links the destination and title are defined elsewhere in the document.

A link text consists of a sequence of zero or more inline elements enclosed by square brackets ([ and ]). The following rules apply:

  • Links may not contain other links, at any level of nesting. If multiple otherwise valid link definitions appear nested inside each other, the inner-most definition is used.

  • Brackets are allowed in the link text only if

    a) they are backslash-escaped or

    b) they appear as a matched pair of brackets, with an open bracket [, a sequence of zero or more inlines, and a close bracket ].

  • Backtick [code spans][gfm-inlnie-code], autolinks, and raw HTML tags bind more tightly than the brackets in link text. Thus, for example, [foo`]` could not be a link text, since the second ] is part of a code span.

  • The brackets in link text bind more tightly than markers for emphasis and strong emphasis. Thus, for example, *[foo*](url) is a link.

A link destination consists of either

  • a sequence of zero or more characters between an opening < and a closing > that contains no line breaks or unescaped < or > characters, or

  • a nonempty sequence of characters that does not start with <, does not include ASCII space or control characters, and includes parentheses only if

    a) they are backslash-escaped or

    b) they are part of a balanced pair of unescaped parentheses. (Implementations may impose limits on parentheses nesting to avoid performance issues, but at least three levels of nesting should be supported.)

A link title consists of either

  • a sequence of zero or more characters between straight double-quote characters ("), including a " character only if it is backslash-escaped, or

  • a sequence of zero or more characters between straight single-quote characters ('), including a ' character only if it is backslash-escaped, or

  • a sequence of zero or more characters between matching parentheses ((...)), including a ( or ) character only if it is backslash-escaped.

Although link titles may span multiple lines, they may not contain a blank line.

An inline link consists of a link text followed immediately by a left parenthesis (, optional whitespace, an optional link destination, an optional link title separated from the link destination by whitespace, optional whitespace, and a right parenthesis ). The link’s text consists of the inlines contained in the link text (excluding the enclosing square brackets). The link’s URI consists of the link destination, excluding enclosing <...> if present, with backslash-escapes in effect as described above. The link’s title consists of the link title, excluding its enclosing delimiters, with backslash-escapes in effect as described above.

Install

npm install --save @yozora/tokenizer-link

Usage

tip

@yozora/tokenizer-link has been integrated into @yozora/parser / @yozora/parser-gfm-ex / @yozora/parser-gfm, so you can use YozoraParser / GfmExParser / GfmParser directly.

import YozoraParser from '@yozora/parser'

const parser = new YozoraParser()

// parse source markdown content
parser.parse(`
[link](/uri "title")
[link](/uri)
`)

Options

NameTypeRequiredDefault
namestringfalse"@yozora/tokenizer-link"
prioritynumberfalseTokenizerPriority.LINKS
  • name: The unique name of the tokenizer, used to bind the token it generates, to determine the tokenizer that should be called in each life cycle of the token in the entire matching / parsing phase.

  • priority: Priority of the tokenizer, determine the order of processing, high priority priority execution. interruptable. In addition, in the match-block stage, a high-priority tokenizer can interrupt the matching process of a low-priority tokenizer.

    Exception: Delimiters of type full are always processed before other type delimiters.

Types

@yozora/tokenizer-link produce Link type nodes. See @yozora/ast for full base types.

import type { YatParent, Resource } from '@yozora/ast'

export const LinkType = 'link'
export type LinkType = typeof LinkType

/**
* Link represents a hyperlink.
* @see https://github.com/syntax-tree/mdast#link
* @see https://github.github.com/gfm/#inline-link
*/
export interface Link extends Parent<LinkType>, Resource {}

Live Examples

  • Basic.

    #493
      
      
  • The title may be omitted.

    #494
      
      
  • Both the title and the destination may be omitted.

      
      
  • The destination can only contain spaces if it is enclosed in pointy brackets.

      
      
  • The destination cannot contain line breaks, even if enclosed in pointy brackets.

      
      
  • The destination can contain ) if it is enclosed in pointy brackets.

    #501
      
      
  • Pointy brackets that enclose links must be unescaped.

    #502
      
      
  • These are not links, because the opening pointy bracket is not matched properly.

    #503
      
      
  • Parentheses inside the link destination may be escaped.

    #504
      
      
  • Any number of parentheses are allowed without escaping, as long as they are balanced.

    #505
      
      
  • However, if you have unbalanced parentheses, you need to escape or use the <...> form.

      
      
  • Parentheses and other symbols can also be escaped, as usual in Markdown.

    #508
      
      
  • A link can contain fragment identifiers and queries.

    #509
      
      
  • Note that a backslash before a non-escapable character is just a backslash.

    #510
      
      
  • Note that, because titles can often be parsed as destinations, if you try to omit the destination and keep the title, you’ll get unexpected results.

    #512
      
      
  • Titles may be in single quotes, double quotes, or parentheses.

    #513
      
      
  • Backslash escapes and entity and numeric character references may be used in titles.

    #514
      
      
  • Titles must be separated from the link using a whitespace. Other Unicode whitespace like non-breaking space doesn’t work.

    #515
      
      
  • Nested balanced quotes are not allowed without escaping.

    #516
      
      
  • But it is easy to work around this by using a different quote type.

    #517
      
      
  • [Whitespace][gfm-whitepace] is allowed around the destination and title.

    #518
      
      
  • But it is not allowed between the link text and the following parenthesis.

    #519
      
      
  • The link text may contain balanced brackets, but not unbalanced ones, unless they are escaped

      
      
  • The link text may contain inline content.

      
      
  • However, links may not contain other links, at any level of nesting.

      
      
  • These cases illustrate the precedence of link text grouping over emphasis grouping.

      
      
  • Note that brackets that aren’t part of links do not take precedence.

    #531
      
      
  • These cases illustrate the precedence of HTML tags, code spans, and autolinks over link grouping.