Skip to main content
Version: 3.x.x ๐Ÿšง

Yozora

logo.png

LicensePackage VersionGithub Top LanguageNode.js VersionCI WorkflowTested with JestCode Style: prettier

๐ŸŽ‰ Why named "yozora" ?โ€‹

Yozora is the Roman sound of Japanese ใ€Œใ‚ˆใžใ‚‰ใ€, taken from the lyrics in ใ€Ž่Šฑ้ณฅ้ขจๆœˆใ€ by the band ไธ–็•Œใฎ็ต‚ใ‚ใ‚Š.

This project is a monorepo that aims to implement a highly extensible, pluggable Markdown parser. Based on the idea of middlewares, the core algorithm @yozora/core-parser will schedule tokenizers (such as @yozora/tokenizer-autolink) to complete the parsing tasks. More accurately, yozora is an algorithm to parse Markdown or its extended syntax contents into an abstract syntax tree (AST).

โœจ Featuresโ€‹

  • ๐Ÿ”– Fully support all the rules mentioned in the GFM specification, and has passed almost all test cases created based on the examples in the specification (except the one https://github.github.com/gfm/#example-653, as there is no plan to support native HTML tags in the React Renderer, for the Yozora AST, so I'm a little lazy to do the tag filtering. If you need it, you can do the filtering by yourself).

    See @yozora/parser-gfm or @yozora/parser-gfm-ex for further information.

  • ๐Ÿš€ Robust.

    • All codes are written in Typescript, with the guarantee of strictly static type checking.

    • Eslint and Prettier to constrain coding styles to avoid error-prone problems such as hack syntax and shadow variables.

    • Tested with Jest, and passed a large number of test cases.

  • ๐Ÿ’š Tidy: No third-party dependencies.

  • โšก๏ธ Efficient.

    • The parsing complexity is the length of source contents multiplied by the number of tokenizers, which has reached the lower bound of theoretical complexity.

    • The parser API supports streaming read-in (using generators /iterators for input), and supports parsing while read-in (Only block-level data is supported yet).

    • Carefully handle the array creation / concat operations. To reused the array as much as possible during the entire matching phase, only use the array index to delineate the matching range. And a lot of strategies applied to reduce duplicated matching / parsing operations.

  • ๐Ÿฉน Compatibility, the parsed syntax tree is compatible with the one defined in [Mdast][mdast-homepage].

    Even if some data types are not compatible in the future, it is easy to traverse the AST for adaptation and modification through the API provided in @yozora/ast-util.

  • ๐ŸŽจ Extendibility, Yozora comes with a plug-in system, which allowed Yozora to schedule the tokenizers through an internal algorithms to complete the parsing tasks.

    • It's easy to create and integrate custom tokenizers.

    • All tokenizers can be mounted or unmounted freely.

      Some tokenizers of the data types that not mentioned in GFM have been implemented in this repository, such as @yozora/tokenizer-admonition, @yozora/tokenizer-footnote, etc. All of them are built into @yozora/parser in default, you can uninstall them at will, if you don't like it.

๐Ÿ“ Usageโ€‹

@yozora/parser: (Recommended) A Markdown parser with rich built-in tokenizers.

import YozoraParser from '@yozora/parser'

const parser = new YozoraParser()
parser.parse('source content')

๐Ÿ’ก FAQโ€‹

๐Ÿ’ฌ Contactโ€‹

๐Ÿ“„ Licenseโ€‹

Yozora is MIT licensed.