Site packages
A site package is the unit of work in sitely: one npm package per website. Each package ships a declarative src/index.ts written against the @sitely/framework DSL, a fixtures/ directory of captured HTML, the compiled dist/index.js produced by sitely build, and a test suite that runs the same eight checks every other package has to pass.
If @sitely/framework is the contract, a site package is the implementation. Everything else — the build pipeline, the test runner, the server's site-loader, the directory — exists to consume site packages.
What's in a package
packages/<author>-site-foo/
├── package.json
├── src/
│ ├── index.ts # defineSite(...).resource(...).page(...).build()
│ └── pages/
│ └── ... # optional segments
├── fixtures/
│ └── article/
│ ├── <hash>.html
│ ├── <hash>.expected.json
│ └── <hash>.meta.json
├── dist/ # emitted by `sitely build` — checked in
│ ├── index.js # compiled site, version injected from package.json
│ ├── manifest.json
│ ├── baseline-manifest.json
│ └── schemas/
│ └── Article.json
├── *.test.ts # uses @sitely/framework/testing
└── README.mdTwo things to notice about the layout:
dist/is checked in. The compiled site, the manifest, and the schemas all live in git so reviewers, the directory, and any tooling that wants to read them can do so without running the build. Themanifest-integritycheck regenerates the manifest in CI and asserts byte-equality against the committed copy. Drift fails the build.- Fixtures are the package's regression corpus. Every test the package runs operates on fixtures, not live HTTP. New fixtures are captured with
sitely snapshot --page <key> '<params>'; the resulting HTML + meta is committed alongside the code.
Two ways to publish
The package name distinguishes the two paths:
| Naming | Path | Curation |
|---|---|---|
@sitely/site-<name> | sitely-curated | Maintained by the sitely maintainers. Not a PR queue for outsiders; adoption into this namespace is a separate, maintainer-initiated motion. |
<author>-site-<name> | Community | Publish-first. The author writes the package, runs sitely test locally, and npm publishes from their own repo. No PR queue, no permission ask, no blocked-on-a-human failure mode at publication. |
Both paths produce the same artifact shape — a built dist/index.js plus manifest plus fixtures plus tests. The directory shows them side-by-side, distinguished by namespace. The community path is the default; the sitely-curated path is the curated exception.
A walkthrough
The shortest possible package is a single-origin, single-resource site:
// src/index.ts
import { defineSite, urlPattern, TTL } from "@sitely/framework";
import { Article, ItemList } from "@sitely/schemas";
import pkg from "../package.json" with { type: "json" };
const itemUrl = urlPattern("https://news.ycombinator.com/item/:id");
const frontPageUrl = urlPattern("https://news.ycombinator.com/news");
export default defineSite({
site: { id: "hackernews", displayName: "Hacker News", version: pkg.version },
origins: [{ hostname: "news.ycombinator.com" }],
rateLimit: { maxConcurrent: 2, requestsPerSecond: 1/2 },
})
.resource("story", {
schema: Article,
url: itemUrl,
ttl: TTL.medium,
})
.resource("frontPage", {
schema: ItemList,
url: frontPageUrl,
ttl: TTL.short,
})
.page(itemUrl, {
validate: (ctx) => ctx.$("td.title").exists(),
extract: async (ctx) => ({
story: {
"@type": () => "Article" as const,
headline: () => ctx.$("td.title a").text(),
articleBody: () => ctx.$("td.title").text(),
datePublished: () => ctx.$("span.age").attr("title") ?? "",
},
}),
fixtures: [{ params: { id: "1" } }],
})
.page(frontPageUrl, {
validate: (ctx) => ctx.$("table.itemlist").exists(),
paginate: { next: (ctx) => ctx.$("a.morelink").attr("href") ?? null },
extract: async (ctx) => ({
frontPage: {
"@type": () => "ItemList" as const,
itemListElement: () => ctx.$$("tr.athing").map((row) => ({
name: row.find("td.title a").text(),
url: row.find("td.title a").attr("href"),
})),
},
}),
fixtures: [{ params: {} }],
})
.build();Field-by-field:
siteis the package's identity.id: "hackernews"is the namespace prefix used everywhere — cache keys, resource identifiers (hackernews:story), directory URLs.displayNameis what humans see.versionis imported frompackage.json.originsis a single hostname here. The server uses this list to dispatch incoming URLs.rateLimitsays: at most 2 concurrent requests, one every two seconds. The fraction form (1/2) reads as "one request every two seconds"..resource("story", ...)+.resource("frontPage", ...)declare two typed outputs. Each has its own schema, URLPattern, and TTL..page(itemUrl, ...)+.page(frontPageUrl, ...)declare the two URL patterns where extraction happens. The page's URLPattern is also its identifier in the manifest.paginate.nexton the front page returns the URL of the "more" link ornullwhen exhausted.fixturesinline declarations — params typed against the page's URLPattern. The framework derives the on-disk hash from the params.
A slightly more interesting case — a multi-locale site:
// src/index.ts
import { defineSite, urlPattern, presence, asset, TTL } from "@sitely/framework";
import { Article } from "@sitely/schemas";
import { z } from "zod";
import pkg from "../package.json" with { type: "json" };
const articleUrl = urlPattern("https://{locale}.wikipedia.org/wiki/:title");
const WikipediaArticle = z.object({
...Article.shape,
"@type": z.literal("Article"),
pageId: z.number(),
revisionId: z.number(),
categories: z.array(z.string()),
leadImage: presence(asset("image"), 0.6),
});
export default defineSite({
site: { id: "wikipedia", displayName: "Wikipedia", version: pkg.version, homepage: "https://www.wikipedia.org/" },
origins: [{ hostname: "{locale}.wikipedia.org", templated: true }],
rateLimit: { maxConcurrent: 3, requestsPerSecond: 1 },
locales: { source: "host", values: ["en", "de", "fr"], default: "en" },
normalizeUrl: (url) => url.replace(/\?.*$/, ""), // strip query string
framework: { minVersion: "0.1.0", maxVersion: "1.0.0" },
})
.resource("article", {
schema: WikipediaArticle,
url: articleUrl,
ttl: TTL.daily,
})
.page(articleUrl, {
validate: (ctx) => ctx.$("#firstHeading").exists(),
extract: async (ctx) => ({
article: {
"@type": () => "Article" as const,
headline: () => ctx.$("#firstHeading").text(),
articleBody: () => ctx.$("#mw-content-text .mw-parser-output").text(),
pageId: () => Number(ctx.$('[name="wgArticleId"]').attr("content")),
revisionId: () => Number(ctx.$('[name="wgCurRevisionId"]').attr("content")),
categories: () => ctx.$$("#catlinks .mw-normal-catlinks li").map((li) => li.text()),
leadImage: () => ctx.$(".infobox img").attr("src"),
},
}),
fixtures: [
{ params: { locale: "en", title: "TypeScript" } },
{ params: { locale: "de", title: "TypeScript" } }, // satisfies locale-matrix
{ params: { locale: "en", title: "Nonexistent_Page" }, errorCase: true },
],
})
.build();What changes vs the Hacker News example:
- One templated origin (
{locale}.wikipedia.org) plus alocalesblock produces three concrete hostnames at use-time. The manifest carries the expanded list. locales.source: "host"— the locale lives in the subdomain. Withsource: "path", the URL pattern would carry it (/:locale/wiki/:title); withsource: "query", the locale comes from a query parameter. The cache key always includes locale regardless of source.normalizeUrlstrips query strings — Wikipedia carries lots of variants (?action=raw,?oldid=...) that should collapse to one cache entry.frameworkpins compatible framework versions. The server's site-loader refuses to load this package under a framework version outside[0.1.0, 1.0.0].presence(asset("image"), 0.6)declares the lead image is present ~60% of the time — drift telemetry alerts if the observed rate diverges.- Fixtures cover two locales to satisfy the
locale-matrixcheck, plus oneerrorCase: truefor the error-path check.
Fixtures and their conventions
A fixture is a captured HTTP response paired with its expected extraction. On-disk layout (per-page directory, per-params hash):
fixtures/
└── <page-key>/
├── <hash>.html # captured HTML body
├── <hash>.expected.json # what extract(ctx) should produce
└── <hash>.meta.json # url, status, headers, fetchedAt<page-key> is derived from the page's URL pattern (e.g. article for /article/:id). <hash> is a short stable hash of the fixture's params.
Fixtures are committed to git. The deterministic test runner loads <hash>.html, wraps it with a CheerioDriver, builds an ExtractContext (with meta.json filling url / status / headers), and runs validate(ctx) + extract(ctx) in-process. The default ctx.fetch throws — if an extractor reaches for the network during a test, the test fails loudly.
Why this matters: every check on every PR operates on the same bytes a reviewer can read. There is no "works on my machine" axis. Reproducibility comes from fixtures being the input, not the input being a snapshot of whatever the site happened to serve when CI last ran.
New fixtures are captured with sitely snapshot --page <key> '<params>'. The CLI ignores robots.txt on this path — author-initiated actions, not server-side traffic.
The build output
Running sitely build in a package directory produces four artifacts:
dist/index.js— the compiled site definition withversioninjected frompackage.json. Server + client both import this.dist/manifest.json— the manifest. Site identity, origin list, resources, pages, schemas, framework version, source commit.dist/schemas/<Name>.json— one JSON Schema per schema referenced from a resource.dist/baseline-manifest.json— the previous release's manifest, rotated forward atsitely build --publishtime. Used bysemver-discipline.
Determinism is a build rule: regenerating from the same source must produce byte-identical output. build.commit and build.builtAt are pinned to the package's last source-touching commit, never wall clock. Field order is lexicographic.
Multi-locale sites
A few rules that fall out of "one identity, many origins":
- One package per site, never per language.
packages/site-wikipediacoversen.wikipedia.org,de.wikipedia.org, andfr.wikipedia.orgfrom one package. The directory shows one entry, not three. originsis derived fromlocaleswhen the locale lives in the host. The site definition declares a templated origin ({locale}.wikipedia.org);getActiveOrigins()expands it againstlocales.valuesat use-time.- Robots.txt is per-origin. Locale-in-host sites have N robots.txt files; the server's robots cache fetches each one separately. Locale-in-path sites have one robots.txt that covers every locale.
- Fixtures must cover at least two declared locales. The
locale-matrixcheck fails if a multi-locale package ships fixtures for only one locale. Single-locale declarations skip the check entirely.
Site families
A family is an opt-in case where one package covers multiple origins that share literal HTML structural identity — every Stack Exchange site, for example. The criterion isn't "looks similar"; it's "the selectors that work for one work for all, byte-for-byte, every page type."
Family packages declare a family block in the site header:
defineSite({
site: { id: "stackexchange", displayName: "Stack Exchange", version: pkg.version },
origins: [{ hostname: "*.stackexchange.com", templated: true }],
family: {
origins: [
{ hostname: "stackoverflow.com", display: "Stack Overflow" },
{ hostname: "serverfault.com", display: "Server Fault" },
{ hostname: "superuser.com", display: "Super User" },
],
structuralIdentityCheck: "stackexchange",
},
rateLimit: { maxConcurrent: 2, requestsPerSecond: 1 },
})The structuralIdentityCheck identifier names a check that asserts the origins do, in fact, share literal HTML identity — typically a fixture set drawn from a representative subset of family members. The check failing means the family has drifted apart and the package needs to split into per-origin packages.
Per-origin verified state is preserved across the family — one origin getting flagged as drift suspected doesn't taint the whole family, just that origin.
Per-origin packaging is the default. Family eligibility is bounded, requires the structural-identity check to pass, and is the exception. Reach for a family package when the cost of N near-identical packages exceeds the cost of carrying the family-check machinery; default to per-origin otherwise.
What the test suite enforces
Every package must pass the same eight checks before it's considered shippable. The full list and exact failure semantics live in The test suite. At a glance:
fixture-extraction— every fixture'svalidate+extractsucceeds and the result matches<hash>.expected.json.schema-conformance— every extracted result validates against its declared schema.determinism— re-running extraction on the same fixture twice yields byte-identical output.schema-emission-roundtrip— extracted output validates against the JSON Schema emitted todist/schemas/<Name>.json.locale-matrix— multi-locale sites have fixtures for at least two declared locales (skipped for single-locale).error-path-coverage— at least oneerrorCase: truefixture exists andvalidatereturnsfalsefor it.manifest-integrity— regenerating the manifest from source produces byte-identical output.semver-discipline— manifest diff againstdist/baseline-manifest.jsonmatches the version bump inpackage.json.
Plus warning-only checks: fixture-freshness, performance-budget, ttl-plausibility, fixture-coverage. None block shipping.
These eight are the entire automated bar for "this package is shippable." There is no ninth must-pass check hiding in a reviewer's head.
Edge cases and failure modes
The boundaries the site-package contract pins down.
Build failures
src/index.tshas no default export. Build fails:expected a default export from defineSite(...).build().- A resource's
schemais not a Standard Schema. Build fails with the resource name. - A resource declares both
urlandderivedFrom. Build fails — they're mutually exclusive. - A resource declares
derivedFrom: "X"whereXisn't a registered resource. Compile error at the builder step; build never starts. - A
resources.<name>.ttltriple hasmin > max. Build fails with the resource name and values. - A page's
extractreturns a key that isn't a registered resource. Compile error at the builder step. originsis empty. Build fails — a site definition must claim at least one hostname.- A
locales.valuedoesn't appear in the template expansion. Build fails with the offending locale. - A schema has
.optional()/.nullable()/.nullish()without apresence()wrapper. Build fails with the field path.
Page matching at runtime
- Two pages match the same URL. First declaration wins. Authors who don't want one pattern to over-match another should narrow the pattern or use a
validatepredicate. - A page's
validate(ctx)returnsfalse. The framework records the result and does not callextract. The server returnsstatus: "error"witherror: { kind: "page_validation_failed" }— there's no generic-extraction fallback. If the page can sometimes serve content that doesn't match the pattern (captcha, login wall), throw a framework error fromcheckResponseso the consumer gets a more specific status. - A page's
extract(ctx)throws anExtractionError. The framework records the error; the orchestrator returnsstatus: "error"(orstaleif a stale row exists). The error includes the URL and the page key. - A field function throws. Caught per-field. The framework records the error against that field path; other fields continue extracting. Schema validation downstream decides whether absence is permitted.
paginate.next(ctx)returns a URL outside the site's origins. The framework refuses to follow it; the walk stops, partial result returned withhasMore: true, cursor: null.
Fixtures
- A fixture's
<hash>.expected.jsonis missing for a non-error fixture.fixture-extractionfails with the fixture name and the missing path. <hash>.expected.jsonexists but the extracted data doesn't match.fixture-extractionfails with a structural diff; the author updates either the extractor or the fixture.<hash>.htmlhas been edited but<hash>.meta.jsonwasn't. Thefixture-freshnesswarning catches inconsistency (doesn't block shipping).
Locales
- A locale value appears in an incoming URL but isn't in
locales.values. The fetch path treats it as a 404 — no page matches because the locale expansion doesn't cover that value. - A package declares
locales.values: ["en", "fr"]but no fixtures cover French.locale-matrixfails.
Family declarations
- Two origins in a family declare different rate limits. Both are respected. Rate limits scope to origin by default; each origin in a family carries its own slot in the per-host token bucket.
- An origin in a family fails the
structuralIdentityCheck. The check fails the build for the package as a whole. Either the family is split into per-origin packages, or the offending origin is removed from the family list.
Hostname conflicts at load
- Two installed packages claim the same hostname. The server's site-loader logs a warning and keeps the first registration. The second package is otherwise silent for that hostname. If one of those packages also claims another hostname, that other hostname is still served by the second package.
Read next
- @sitely/framework — the DSL and the runtime contract every site package implements.
- The build manifest — the field-by-field shape of
dist/manifest.json. - Build subsystem — how the builder output becomes
dist/index.js+ manifest. - Test-pkg subsystem — the in-process runner and the eight checks in operational detail.
- Glossary — terms used throughout these pages.