Skip to main content
Version: 1.x.x

@yozora/tokenizer-html-block

Npm VersionNpm DownloadNpm LicenseModule formats: cjs, esmNode.js VersionTested with JestCode Style: prettier

github flavor markdown spec

An HTML block is a group of lines that is treated as raw HTML (and will not be escaped in HTML output).

There are seven kinds of HTML block, which can be defined by their start and end conditions. The block begins with a line that meets a start condition (after up to three spaces optional indentation). It ends with the first subsequent line that meets a matching end condition, or the last line of the document, or the last line of the container block containing the current HTML block, if no line is encountered that meets the end condition. If the first line meets both the start condition and the end condition, the block will contain just that line.

  1. Start condition: line begins with the string <script, <pre, or <style (case-insensitive), followed by whitespace, the string >, or the end of the line.

End condition: line contains an end tag </script>, </pre>, or </style> (case-insensitive; it need not match the start tag).

  1. Start condition: line begins with the string <!--.

End condition: line contains the string -->.

  1. Start condition: line begins with the string <?.

End condition: line contains the string ?>.

  1. Start condition: line begins with the string <! followed by an uppercase ASCII letter.

End condition: line contains the character >.

  1. Start condition: line begins with the string <![CDATA[.

End condition: line contains the string ]]>.

  1. Start condition: line begins the string < or </ followed by one of the strings (case-insensitive) address, article, aside, base, basefont, blockquote, body, caption, center, col, colgroup, dd, details, dialog, dir, div, dl, dt, fieldset, figcaption, figure, footer, form, frame, frameset, h1, h2, h3, h4, h5, h6, head, header, hr, html, iframe, legend, li, link, main, menu, menuitem, nav, noframes, ol, optgroup, option, p, param, section, source, summary, table, tbody, td, tfoot, th, thead, title, tr, track, ul, followed by whitespace, the end of the line, the string >, or the string />.

End condition: line is followed by a blank line.

  1. Start condition: line begins with a complete open tag (with any [tag name]gfm-tag-name other than script, style, or pre) or a complete closing tag, followed only by whitespace or the end of the line.

End condition: line is followed by a blank line.

HTML blocks continue until they are closed by their appropriate end condition, or the last line of the document or other container block. This means any HTML within an HTML block that might otherwise be recognised as a start condition will be ignored by the parser and passed through as-is, without changing the parser’s state.

Install

npm install --save @yozora/tokenizer-html-block

Usage

tip

@yozora/tokenizer-html-block has been integrated into @yozora/parser / @yozora/parser-gfm-ex / @yozora/parser-gfm, so you can use YozoraParser / GfmExParser / GfmParser directly.

import YozoraParser from '@yozora/parser'

const parser = new YozoraParser()

// parse source markdown content
parser.parse(`
<pre language="haskell"><code>
import Text.HTML.TagSoup

main :: IO ()
main = print $ parseTags tags
</code></pre>
okay
`)

Options

NameTypeRequiredDefault
namestringfalse"@yozora/tokenizer-html-block"
prioritynumberfalseTokenizerPriority.ATOMIC
  • name: The unique name of the tokenizer, used to bind the token it generates, to determine the tokenizer that should be called in each life cycle of the token in the entire matching / parsing phase.

  • priority: Priority of the tokenizer, determine the order of processing, high priority priority execution. interruptable. In addition, in the match-block stage, a high-priority tokenizer can interrupt the matching process of a low-priority tokenizer.

Types

@yozora/tokenizer-html-block produce Html type nodes. See @yozora/ast for full base types.

import type { YastLiteral } from '@yozora/ast'

export const HtmlType = 'html'
export type HtmlType = typeof HtmlType

/**
* HTML (Literal) represents a fragment of raw HTML.
* @see https://github.com/syntax-tree/mdast#html
* @see https://github.github.com/gfm/#html-blocks
* @see https://github.github.com/gfm/#raw-html
*/
export type Html = YastLiteral<HtmlType>

Live Examples

  • (Condition 1)

      
      
  • Comment (Condition 2)

    #148
      
      
  • Processing instruction (Condition 3)

    #149
      
      
  • Declaration (Condition 4)

    #150
      
      
  • CDATA (Condition 5)

    #151
      
      
  • (Condition 6)

    #119
      
      
  • (Condition 7)

    #133