Skip to main content
Version: 3.x.x 🚧

@yozora/tokenizer-html-inline

Npm VersionNpm DownloadNpm LicenseModule formats: cjs, esmNode.js VersionTested with JestCode Style: prettier

github flavor markdown spec

Text between < and > that looks like an HTML tag is parsed as a raw HTML tag and will be rendered in HTML without escaping. Tag and attribute names are not limited to current HTML tags, so custom tags (and even, say, DocBook tags) may be used.

Here is the grammar for tags:

A tag name consists of an ASCII letter followed by zero or more ASCII letters, digits, or hyphens (-).

An attribute consists of whitespace, an attribute name, and an optional attribute value specification.

An attribute name consists of an ASCII letter, _, or :, followed by zero or more ASCII letters, digits, _, ., :, or -. (Note: This is the XML specification restricted to ASCII. HTML5 is laxer.)

An attribute value specification consists of optional whitespace, a = character, optional whitespace, and an attribute value.

An attribute value consists of an unquoted attribute value, a single-quoted attribute value, or a double-quoted attribute value.

An unquoted attribute value is a nonempty string of characters not including whitespace, ", ', =, <, >, or `.

A single-quoted attribute value consists of ', zero or more characters not including ', and a final '.

A double-quoted attribute value consists of ", zero or more characters not including ", and a final ".

An open tag consists of a < character, a tag name, zero or more attributes, optional whitespace, an optional / character, and a > character.

A closing tag consists of the string </, a tag name, optional whitespace, and the character >.

An HTML comment consists of <!-- + text + -->, where text does not start with > or ->, does not end with -, and does not contain --. (See the HTML5 spec.)

A processing instruction consists of the string <?, a string of characters not including the string ?>, and the string ?>.

A declaration consists of the string <!, a name consisting of one or more uppercase ASCII letters, whitespace, a string of characters not including the character >, and the character >.

A CDATA section consists of the string <![CDATA[, a string of characters not including the string ]]>, and the string ]]>.

An HTML tag consists of an open tag, a closing tag, an HTML comment, a processing instruction, a declaration, or a CDATA section.

Install

npm install --save @yozora/tokenizer-html-inline

Usage

tip

@yozora/tokenizer-html-inline has been integrated into @yozora/parser / @yozora/parser-gfm-ex / @yozora/parser-gfm, so you can use YozoraParser / GfmExParser / GfmParser directly.

import YozoraParser from '@yozora/parser'

const parser = new YozoraParser()

// parse source markdown content
parser.parse(`
<a><bab><c2c>

foo <?php echo $a; ?>
`)

Options

NameTypeRequiredDefault
namestringfalse"@yozora/tokenizer-html-inline"
prioritynumberfalseTokenizerPriority.ATOMIC
  • name: The unique name of the tokenizer, used to bind the token it generates, to determine the tokenizer that should be called in each life cycle of the token in the entire matching / parsing phase.

  • priority: Priority of the tokenizer, determine the order of processing, high priority priority execution. interruptable. In addition, in the match-block stage, a high-priority tokenizer can interrupt the matching process of a low-priority tokenizer.

    Exception: Delimiters of type full are always processed before other type delimiters.

Types

@yozora/tokenizer-html-inline produce Html type nodes. See @yozora/ast for full base types.

import type { Literal } from '@yozora/ast'

export const HtmlType = 'html'
export type HtmlType = typeof HtmlType

/**
* HTML (Literal) represents a fragment of raw HTML.
* @see https://github.com/syntax-tree/mdast#html
* @see https://github.github.com/gfm/#html-blocks
* @see https://github.github.com/gfm/#raw-html
*/
export type Html = Literal<HtmlType>

Live Examples

  • Opening.

      
      
  • Closing.

    #642
      
      
  • Comments.

      
      
  • Processing instruction.

    #647
      
      
  • Declaration.

    #648
      
      
  • CDATA section.

    #649