Validate Your Markdown Files
In Markdown, we can write any document with valid syntax. For example, Markdown supports to directly write HTML tag, we can write HTML tag <h1>title</h1>
instead of Markdown syntax #title
.
But for some purpose, some behaviors are unwanted, for example, you may not want to allow <script>
tag in Markdown that can insert any javascript.
In this document, you'll learn how to define markdown validation rules, which will help you to validate markdown documents in an efficient way.
Markdown validation is part of DFM, if you switch Markdown engine to other engine, validation might not work.
There're three kinds of validation rules provided by DocFX:
- HTML tag rule, which is used to validate HTML tags in Markdown. There is a common need to restrict usage of HTML tags in Markdown to only allow "safe" HTML tags, so we created this built-in rule for you.
- Markdown token rule. This can be used to validate different kinds of Markdown syntax elements, like headings, links, images, etc.
- Metadata rule. This can be used to validate metadata of documents. Metadata can be defined in YAML header,
docfx.json
, or a single JSON file. Metadata rule gives you a central place to validate metadata against certain principle.
HTML tag validation rules
For most cases, you may want to prohibit using certain html tags in markdown, so we built a built-in html tag rule for you.
To define a HTML tag rule, simply create a md.style
with following content:
{
"tagRules": [
{
"tagNames": [ "H1", "H2" ],
"relation": "In",
"behavior": "Warning",
"messageFormatter": "Please do not use <H1> and <H2>, use '#' and '##' instead.",
"customValidatorContractName": null,
"openingTagOnly": false
}
]
}
Then when anyone write <H1>
or <H2>
in Markdown file, it will give a warning.
You can use the following proprties to configure the HTML tag rule:
tagNames
is the list of HTML tag names to validate, required, case-insensitive.relation
is optional fortagNames
:In
means when html tag is intagNames
, this is default value.NotIn
means when html tag is not intagNames
.
behavior
defines the behavior when the HTML tag is met, required. Its value can be following:- None: Do nothing.
- Warning: Log a warning.
- Error: Log an error, it will break current build.
messageFormatter
is the log message when the HTML tag is hit, required. It can contain following variables:{0}
the name of tag.{1}
the whole tag.
For example, the
messageFormatter
is{0} is the tag name of {1}.
, and the tag is<H1 class="heading">
match the rule, then it will output following message:H1 is the tag name of <H1 class="heading">.
customValidatorContractName
is an extension tag rule contract name for complex validation rule, optional.openingTagOnly
is a boolean, option, default isfalse
if
true
, it will only apply to opening tag, e.g.<H1>
, otherwise, it will also apply to closing tag, e.g.</H1>
.
Test your rule
To enable your rule, put md.style
in the same folder of docfx.json
, then run docfx
, warning will be shown if it encounters <H1>
or <H2>
during build.
Create a custom HTML tag rule
By default HTML tag rule only validates whether an HTML tag exists in Markdown. Sometimes you may want to have additional validation against the content of the tag.
For example, you may not want a tag to contain onclick
attribute as it can inject javascript to the page.
You can create a custom HTML tag rule to achieve this.
- Create a project in your code editor (e.g. Visual Studio).
- Add nuget package
Microsoft.DocAsCode.Plugins
andMicrosoft.Composition
. - Create a class and implement ICustomMarkdownTagValidator.
- Add ExportAttribute with contract name.
For example, we require HTML link (<a>
) should not contain onclick
attribute:
[Export("should_not_contain_onclick", typeof(ICustomMarkdownTagValidator))]
public class MyMarkdownTagValidator : ICustomMarkdownTagValidator
{
public bool Validate(string tag)
{
// use Contains for demo purpose, a complete implementation should parse the HTML tag.
return tag.Contains("onclick");
}
}
And update your md.style
with following content:
{
"tagRules": [
{
"tagNames": [ "a" ],
"behavior": "Warning",
"messageFormatter": "Please do not use 'onclick' in HTML link.",
"customValidatorContractName": "should_not_contain_onclick",
"openingTagOnly": true
}
]
}
How to enable custom HTML tag rules
- Same as default HTML tag rule, config the rule in
md.style
. Create a folder (
rules
for example) in your DocFX project folder, put all your custom rule assemblies to aplugins
folder underrules
folder. Now your DocFX project should look like this:/ |- docfx.json |- md.style \- rules \- plugins \- <your_rule>.dll
Update your
docfx.json
with following content:{ ... "dest": "_site", "template": [ "default", "rules" ] }
- Run
docfx
you'll see your rule being executed.
The folder
rules
is actually a template folder. In DocFX, template is a place for you to customize build, render, validation behavior. For more information about template, please refer to our template and plugin documentation.
Markdown token validation rules
Besides HTML tags, you may also want to validate Markdown syntax like heading or links. For example, in Markdown, you may want to limit code snippet to only support a set of languages.
To create such rule, follow the following steps:
- Create a project in your code editor (e.g. Visual Studio).
- Add nuget package
Microsoft.DocAsCode.MarkdownLite
andMicrosoft.Composition
. - Create a class and implements IMarkdownTokenValidatorProvider
MarkdownTokenValidatorFactory contains some helper methods to create a validator.
- Add ExportAttribute with rule name.
For example, the following rule require all code block to be csharp
:
[Export("code_snippet_should_be_csharp", typeof(IMarkdownTokenValidatorProvider))]
public class MyMarkdownTokenValidatorProvider : IMarkdownTokenValidatorProvider
{
public ImmutableArray<IMarkdownTokenValidator> GetValidators()
{
return ImmutableArray.Create(
MarkdownTokenValidatorFactory.FromLambda<MarkdownCodeBlockToken>(t =>
{
if (t.Lang != "csharp")
{
throw new DocumentException($"Code lang {t.Lang} is not valid, in file: {t.SourceInfo.File}, at line: {t.SourceInfo.LineNumber}");
}
}));
}
}
To enable this rule, update your md.style
to the following:
{
"rules": [ "code_snippet_should_be_csharp" ]
}
Then follow the same steps in How to enable custom HTML tag rules, run docfx
you'll see your rule executed.
Logging in your rules
As you can see in the above example, you can throw DocumentException to raise an error, this will stop the build immediately.
You can also use LogWarning(String, String, String, String) and LogError(String, String, String, String) to report a warning and an error respectively.
To use these methods, you need to install nuget package
Microsoft.DocAsCode.Common
first.
The different between ReportError
and throw DocumentException
is throwing exception will stop the build immediately but ReportError
won't stop build but will eventually fail the build after rules are run.
Advanced usage of md.style
Default rules
If a rule has the contract name of default
, it will be enabled by default. You don't need to enable it in md.style
.
Enable/disable rules in md.style
You can add use disable
to specify whether disable a rule:
{
"rules": [ { "contractName": "<contract_name>", "disable": true } ]
}
This gives you an opportunity to disable the rules enabled by default.
Validate metadata in markdown files
In markdown file, we can write some metadata in conceptual or overwrite document. And we allow to add some plug-ins to validate metadata written in markdown files.
Scope of metadata validation
Metadata is coming from multiple sources, the following metadata will be validated during build:
- YAML header in markdown.
- Global metadata and file metaata in
docfx.json
. - Global metadata and file metadata defined in separate
.json
files.
For more information about global metadata and global metadata, see docfx.json format.
Create validation plug-ins
- Create a project in your code editor (e.g. Visual Studio).
- Add nuget package
Microsoft.DocAsCode.Plugins
andMicrosoft.Composition
. - Create a class and implement IInputMetadataValidator
For example, the following validator prohibits any metadata with name hello
:
[Export(typeof(IInputMetadataValidator))]
public class MyInputMetadataValidator : IInputMetadataValidator
{
public void Validate(string sourceFile, ImmutableDictionary<string, object> metadata)
{
if (metadata.ContainsKey("hello"))
{
throw new DocumentException($"Metadata 'hello' is not allowed, file: {sourceFile}");
}
}
}
Enable metadata rule is same as other rules, just copy the assemblies to the plugins
of your template folder and run docfx
.
Create configurable metadata validation plug-ins
There are two steps to create a metadata validator:
We need to modify export attribute for metadata validator plug-in:
[Export("hello_is_not_valid", typeof(IInputMetadataValidator))]
Note
If the rule doesn't have a contract name, it will be always enabled, i.e., there is no way to disable it unless delete the assembly file.
Modify
md.style
with following content:{ "metadataRules": [ { "contractName": "hello_is_not_valid", "disable": false } ] }
Advanced: Share your rules
Some users have a lot of document projects, and want to share validations for all of them, and don't want to write md.style
file repeatedly.
Create template
For this propose, we can create a template with following structure:
/ (root folder for plug-in)
\- md.styles
|- <category-1>.md.style
\- <category-2>.md.style
\- plugins
\- <your_rule>.dll
In md.styles
folder, there is a set of definition files, with file extension .md.style
, each file is a category.
In one category, there is a set of rule definition.
For example, create a file with name test.md.style
, then write following content:
{
"tagRules": {
"heading": {
"tagNames": [ "H1", "H2" ],
"behavior": "Warning",
"messageFormatter": "Please do not use <H1> and <H2>, use '#' and '##' instead.",
"openingTagOnly": true
}
},
"rules": {
"code": "code_snippet_should_be_csharp"
},
"metadataRules": {
"hello": { "contractName": "hello_is_not_valid", "disable": true }
}
}
Then test
is the category name (from file name) for three rules, and apply different id
for each rule, they are heading
, code
and hello
.
When you build document with this template, all rules will be active when disable
property is false
.
Config rules
Some rules need to be enabled/disabled in some special document project.
For example, hello
rule is not required for most project, but for a special project, we want to enable it.
We need to modify md.style
file in this document project with following content:
{
"settings": [
{ "category": "test", "id": "hello", "disable": false }
]
}
And for some project we need to disable all rules in test category:
{
"settings": [
{ "category": "test", "disable": true }
]
}
Note
disable
property is applied by following order:
tagRules
,rules
andmetadataRules
inmd.style
.- auto enabled
rules
with contract namedefault
. settings
withcategory
andid
inmd.style
.settings
withcategory
inmd.style
.disable
property in definition file.