Site Content Validation
Site Content Validation
Status: operational
Purpose: lightweight pre-build content validation for dkharlanau.github.io
Created for: issue #5
Last updated: 2026-05-26
What this validator checks
scripts/validate_site_content.py is a local Python script that validates site content structure without requiring Ruby or Jekyll. It checks:
1. _data/atlas_index.yml
- Valid YAML — parses without errors.
- Required sections —
meta,fallback, andtopic_clustersmust exist. - Cluster completeness — every cluster must have:
id,title,path,topic_clusterkeywords,allowed_update_types,source_policylast_reviewed,related_pages,agent_notes
- Path resolution — existing paths must resolve to files on disk or valid Jekyll collection paths (
_notes/,_blog/). - Future placeholders — clusters with
path: nullare reported as pending, not errors.
2. docs/templates/*.md
- Required body metadata — every template must contain the required metadata block:
Source:,Date checked:,Confidence:Related page/topic:,Practical implication:
- Front matter safety — draft/news templates must keep
robots: noindexandsitemap: falsedefaults. Atlas templates (e.g.,atlas-fact-update.md,practical-process-note.md) may useindexbecause they target public Atlas pages. - Excluded path — templates live under
docs/templates/, which_config.ymlexcludes from the build.
3. Future news / datelined updates
- If
/news/does not exist, the validator reports it as pending (issue #3) and does not fail. - If
/news/exists and contains.mdfiles, it validates required front matter (title,date,permalink).
4. Homepage protection
- The validator checks staged changes via
git diff --cached --name-only. - If protected files (
index.md,_data/home.yml) are staged, it warns or fails depending on flags. - Use
--allow-homepageto bypass this check when a homepage change is explicitly authorized.
How to run it
Default run
python3 scripts/validate_site_content.py
Exit codes:
0— passed (no errors; warnings may be present)1— failed (errors detected)
Strict mode
Treats all warnings as errors:
python3 scripts/validate_site_content.py --strict
Allow homepage changes
Bypasses the staged-homepage-protection check:
python3 scripts/validate_site_content.py --allow-homepage
Syntax check (CI-friendly)
python3 -m py_compile scripts/validate_site_content.py
What warnings mean
| Warning | Meaning | Action |
|---|---|---|
path is null (future placeholder) |
A topic cluster has no page yet. | None — expected for planned content. |
/news/ does not exist yet |
News section not created. | Wait for issue #3 or create it. |
missing 'noindex' in robots |
A draft template lacks the noindex directive. | Add robots: noindex,follow to keep it out of search indexes until reviewed. |
missing 'sitemap:' front matter |
A draft template lacks an explicit sitemap declaration. | Add sitemap: false for drafts, sitemap: true for public pages. |
Staged changes touch protected homepage files |
A commit would modify index.md or _data/home.yml. |
Use --allow-homepage only if a homepage issue explicitly authorizes the change. |
Relationship to issue #7 (Ruby / Jekyll build validation)
This validator is not a replacement for bundle exec jekyll build. It does not:
- Render Liquid templates
- Resolve Jekyll collections or plugins
- Validate CSS, HTML, or JSON-LD output
- Check internal links in built HTML
What it does:
- Catches structural YAML errors before build
- Validates metadata completeness
- Checks file existence for paths declared in
atlas_index.yml - Protects homepage files from accidental commits
Recommended pipeline:
- Run
python3 scripts/validate_site_content.py(this script — issue #5). - Run
bundle exec jekyll build(requires Ruby — issue #7). - Run
scripts/check_links.py _site/andscripts/check_seo.py _site/(requires built site — issue #7).
If Ruby is unavailable, step 2–3 remain pending. This script provides a lightweight pre-check that can run in any Python environment.
Files
scripts/validate_site_content.py— validator scriptdocs/site-content-validation.md— this documentation
Dependencies
- Python 3.8+
- PyYAML (
pip install pyyaml) - Git (for homepage-protection staged-change detection)
No Ruby, Bundler, or Jekyll required.