Version: 3.x.x 🚧

@yozora/tokenizer-html-inline

github flavor markdown spec

Text between < and > that looks like an HTML tag is parsed as a raw HTML tag and will be rendered in HTML without escaping. Tag and attribute names are not limited to current HTML tags, so custom tags (and even, say, DocBook tags) may be used.

Here is the grammar for tags:

A tag name consists of an ASCII letter followed by zero or more ASCII letters, digits, or hyphens (-).

An attribute consists of whitespace, an attribute name, and an optional attribute value specification.

An attribute name consists of an ASCII letter, _, or :, followed by zero or more ASCII letters, digits, _, ., :, or -. (Note: This is the XML specification restricted to ASCII. HTML5 is laxer.)

An attribute value specification consists of optional whitespace, a = character, optional whitespace, and an attribute value.

An attribute value consists of an unquoted attribute value, a single-quoted attribute value, or a double-quoted attribute value.

An unquoted attribute value is a nonempty string of characters not including whitespace, ", ', =, <, >, or `.

A single-quoted attribute value consists of ', zero or more characters not including ', and a final '.

A double-quoted attribute value consists of ", zero or more characters not including ", and a final ".

An open tag consists of a < character, a tag name, zero or more attributes, optional whitespace, an optional / character, and a > character.

A closing tag consists of the string </, a tag name, optional whitespace, and the character >.

An HTML comment consists of , where text does not start with > or ->, does not end with -, and does not contain --. (See the HTML5 spec.)

A processing instruction consists of the string <?, a string of characters not including the string ?>, and the string ?>.

A declaration consists of the string <!, a name consisting of one or more uppercase ASCII letters, whitespace, a string of characters not including the character >, and the character >.

A CDATA section consists of the string <![CDATA[, a string of characters not including the string ]]>, and the string ]]>.

An HTML tag consists of an open tag, a closing tag, an HTML comment, a processing instruction, a declaration, or a CDATA section.

See github flavor markdown spec for details.
See Live Examples for an intuitive impression.

Install

npm
Yarn
pnpm

npm install --save @yozora/tokenizer-html-inline

yarn add @yozora/tokenizer-html-inline

pnpm add @yozora/tokenizer-html-inline

Usage

tip

@yozora/tokenizer-html-inline has been integrated into @yozora/parser / @yozora/parser-gfm-ex / @yozora/parser-gfm, so you can use YozoraParser / GfmExParser / GfmParser directly.

Basic Usage
YozoraParser
GfmParser
GfmExParser

@yozora/tokenizer-html-inline cannot be used alone, it needs to be registered in Parser as a plugin-in before it can be used.

import { DefaultParser } from '@yozora/core-parser'
import ParagraphTokenizer from '@yozora/tokenizer-paragraph'
import TextTokenizer from '@yozora/tokenizer-text'
import HtmlInlineTokenizer from '@yozora/tokenizer-html-inline'

const parser = new DefaultParser()
  .useFallbackTokenizer(new ParagraphTokenizer())
  .useFallbackTokenizer(new TextTokenizer())
  .useTokenizer(new HtmlInlineTokenizer())

// parse source markdown content
parser.parse(`
<a><bab><c2c>

foo <?php echo $a; ?>
`)

import YozoraParser from '@yozora/parser'

const parser = new YozoraParser()

// parse source markdown content
parser.parse(`
<a><bab><c2c>

foo <?php echo $a; ?>
`)

import GfmParser from '@yozora/parser-gfm'

const parser = new GfmParser()

// parse source markdown content
parser.parse(`
<a><bab><c2c>

foo <?php echo $a; ?>
`)

import GfmExParser from '@yozora/parser-gfm-ex'

const parser = new GfmExParser()

// parse source markdown content
parser.parse(`
<a><bab><c2c>

foo <?php echo $a; ?>
`)

Options

Name	Type	Required	Default
`name`	`string`	`false`	`"@yozora/tokenizer-html-inline"`
`priority`	`number`	`false`	`TokenizerPriority.ATOMIC`

name: The unique name of the tokenizer, used to bind the token it generates, to determine the tokenizer that should be called in each life cycle of the token in the entire matching / parsing phase.
priority: Priority of the tokenizer, determine the order of processing, high priority priority execution. interruptable. In addition, in the match-block stage, a high-priority tokenizer can interrupt the matching process of a low-priority tokenizer.

Exception: Delimiters of type full are always processed before other type delimiters.

Types

@yozora/tokenizer-html-inline produce Html type nodes. See @yozora/ast for full base types.

import type { Literal } from '@yozora/ast'

export const HtmlType = 'html'
export type HtmlType = typeof HtmlType

/**
 * HTML (Literal) represents a fragment of raw HTML.
 * @see https://github.com/syntax-tree/mdast#html
 * @see https://github.github.com/gfm/#html-blocks
 * @see https://github.github.com/gfm/#raw-html
 */
export type Html = Literal<HtmlType>

Live Examples

Opening.
#632
yozora

pretty-json
<a><bab><c2c>
"root":{
...
}
2 items
Closing.
#642
yozora

pretty-json
</a></foo >
"root":{
...
}
2 items
Comments.
#644
yozora

pretty-json
foo 
"root":{
...
}
2 items
Processing instruction.
#647
yozora

pretty-json
foo <?php echo $a; ?>
"root":{
...
}
2 items
Declaration.
#648
yozora

pretty-json
foo <!ELEMENT br EMPTY>
"root":{
...
}
2 items
CDATA section.
#649
yozora

pretty-json
foo <![CDATA[>&<]]>
"root":{
...
}
2 items

Install​

Usage​

Options​

Types​

Live Examples​

Related​

Install

Usage

Options

Types

Live Examples

Related