akashic / methodology

Methodology.

A citable, table-and-column-level description of how every page on Akashic is built — the sources we use, the choices we make, the algorithms we run, and the things we deliberately leave out.

1. Scope & limits

Akashic is a static reference site for US presidential elections at every level of geography the federal government tracks. Every page profiles a single place. Every page carries the full presidential election history of that place from 1892 through 2024, the demographics of the place from the most recent American Community Survey, the religious-adherence profile from the 2020 US Religion Census, and the set of places with the most similar voting trajectory.

Downballot races and primaries are not on the place-page surface, but they are first-class pages in the election section: US Senate, US House (with per-district pages), and governor contests, plus the Democratic presidential primaries. The place page stays a presidential profile; the election family is where the other offices live.

What is not here anywhere yet: ballot measures, state-legislative returns, turnout by demographic group, and polling. Several of these live in the broader Akashic Intelligence platform.

2. Election results

We compose the 1892–2024 county-level presidential series from three primary sources, each authoritative for a different era.

1892–1915: ICPSR historical archive (Inter-university Consortium for Political and Social Research). County-level totals are sparse in this window — and the earlier 1876–1888 returns are partial enough that Akashic does not present them as the main user-facing series yet. We render missing county-years as explicit gaps in the elections table rather than interpolating.
1916–2020: MIT Election Data and Science Lab county-level presidential series. This is the canonical modern dataset for academic election analysis; we use it verbatim and never adjust the vote totals.
2024: State-certified official returns, ingested directly from each state’s election authority. We do not use newswire totals.
Precinct level, 2024: Voting and Election Science Team (VEST) precinct shapefiles and totals, aggregated to the modern county boundaries where they differ.

The election family, by office. Beyond the presidential place-page series, the election section carries each office as its own contest pages — real certified returns only, never an estimate to fill a gap:

President: National and per-state, 1892–2024, with county-level detail in every state (the same series as the place pages).
US Senate: Every contest 1976–2024, with candidate names and the resulting balance of power, from the MIT Election Lab. County maps appear for 2016 onward, where precinct returns exist.
US House: All 435 seats by state and district 1976–2024, with per-district race pages (recent cycles) carrying the district’s margin history. Party classification follows the literal ballot party; an independent who caucuses with a party is named as such, with the margin reported over the runner-up rather than as a two-party gap.
Governor: State and county results, 2018–2025, including the off-year cycles (New Jersey, Virginia, and the odd-year states).
Primaries: Democratic presidential primaries, county level, 2008–2024, with a national who-won-which-state rollup.
Precinct level, downballot: Senate, governor, and US House returns at precinct resolution are being loaded from the VEST harmonized files into the warehouse, validated to the certified statewide total before each state counts. Where a state’s precincts are not yet loaded, its contest still shows the certified county and statewide returns — a gap stays a gap, never a disaggregated estimate dressed as a native return.

Boundary changes. A small number of counties have changed name or boundary over the 132-year window. We carry forward the modern five-digit FIPS code as the stable URL key, and we crosswalk historical vote totals onto the modern geometry. The two consequential cases:

Miami-Dade (FL). Renamed from Dade County in 1997. All pre-1997 results are reported under FIPS 12086.
Connecticut planning regions. In 2022 Connecticut formally replaced its eight legacy counties with nine Council of Government planning regions as its primary subdivision. Pre-2022 county-level election totals are apportioned to planning regions by 2020 town-level population.

Sub-county estimates (the † footnote). City, district, and place pages derive their results by aggregating precinct returns disaggregated to census blocks. In a handful of state-cycles that detail does not exist in the source record: most Kentucky counties tabulated 2020 only county-wide (COVID-era consolidated absentee counting), several New Jersey counties reported 2020 by municipality, Alaska pooled its 2020 absentee wave above the precinct level, and the pre-VEST 2012 collections lack precinct detail for Delaware, Montana, North Dakota, South Dakota, Vermont, and Wyoming. The source files prorate those units uniformly, which would wrongly stamp every neighborhood with the unit-wide margin. For these cycles we instead redistribute each reporting unit’s certified totals across its blocks in proportion to each party’s own precinct geography in the adjacent real cycles, so unit totals reconcile to the vote while sub-unit variation reflects the place’s actual partisan geography. Affected cycles are marked with a dagger (†) on every table that shows them; what cannot be recovered — that cycle’s own hyper-local deviations — stays an estimate, and we label it as one. County, state, metro, and media-market pages are unaffected (they read certified county totals directly).

3. Demographics

Every place page reports demographic data from the most recent US Census Bureau American Community Survey 5-year file. As of the current build that is ACS 2024 5-year (reference period 2020–2024). The 5-year file is the only ACS product available for every county regardless of population.

Suppression handling. The ACS suppresses estimates for very small populations to protect respondent confidentiality. We display suppressed values as “—” rather than zero. Where a derived figure (such as median household income) is suppressed for an entire geography, the figure is omitted from the page and from the JSON record.

Non-Hispanic White share. Per Census convention, race and Hispanic origin are separate dimensions. The figure we label “Non-Hispanic White” is the share of population that self-identifies as White alone (single race) and not of Hispanic or Latino origin.

Connecticut. Demographic data is delivered at the planning-region level (the post-2022 successor to Connecticut’s county system); historical comparability with pre-2022 county-level ACS files is approximate.

4. Religious adherence

Religious-adherence figures come from the 2020 US Religion Census, published by the Association of Statisticians of American Religious Bodies (ASARB). The Religion Census reports the number of adherents per religious body per US county on a decennial cadence.

Bucketing. ASARB reports ~250 distinct religious bodies. For display, we aggregate them into seven traditions:

Baptist — Southern Baptist Convention, National Baptist Convention USA, American Baptist Churches, and other Baptist bodies.
Methodist — United Methodist Church, AME, AME Zion, CME, and other Methodist bodies.
Pentecostal & Holiness — Assemblies of God, Church of God in Christ, Church of the Nazarene, and related bodies.
Catholic & Orthodox — Roman Catholic Church plus all Eastern and Oriental Orthodox bodies.
Mainline Protestant — Presbyterian Church (USA), ELCA, Episcopal Church, Disciples of Christ, UCC, and similar bodies.
Other Christian — LDS, Jehovah’s Witnesses, non-denominational Evangelical, and Christian bodies not above.
Non-Christian — Jewish, Muslim, Hindu, Buddhist, Bahá’í, and other non-Christian bodies.

The bucketing decisions are editorial. They are intended to produce groups roughly comparable in voting alignment, not to adjudicate theological taxonomies.

5. Geography

All boundary geometry is sourced from US Census Bureau TIGER/Line 2024 shapefiles. County polygons are simplified for web delivery using topojson-simplify with a tolerance tuned to keep visible coastline detail while reducing payload size by an order of magnitude.

Precinct geometries are 2024 boundaries, from VEST where available and the state election authority otherwise. Counties for which we do not have precinct geometry on file fall back to a hex-grid layout that preserves the aggregate county margin while visually distinguishing the precinct-level view.

City and town coverage uses the Census Bureau’s place layer — incorporated places plus Census Designated Places. Their presidential results are block-disaggregated and cover 2004–2024 — from precinct returns for 2008 onward (and for 2004 in the five states with precinct returns on file), with the 2004 cycle elsewhere estimated from certified county totals on the nearest real cycle’s block geography and flagged as estimated on the page; each place page links its primary county (the county containing the plurality of its census blocks). One known limitation: minor civil divisions — the township layer that New England and parts of the Midwest govern through — are a separate Census geography and are not yet covered. Where a town center is also a CDP or incorporated place, it appears; a township with no corresponding place entry does not.

6. The similar-counties model

For every county we compute the ten counties with the most similar recent voting trajectory. The model is intentionally simple: cosine similarity over the last-ten-election two-party margin vector.

Let m_i = (D − R) / total for election i. For two counties A and B with margin vectors a and b over the same ten elections, the similarity is (a · b) / (||a|| × ||b||).

The model uses no demographic features. The result reflects political similarity over recent decades, not demographic or geographic similarity. Two counties on opposite coasts can score very high if their margin trajectories rhyme; two neighboring counties can score low if one realigned while the other didn’t.

We chose this over a feature-rich model deliberately. A small, transparent, fully reproducible similarity metric is more useful to a journalist or researcher than a black-box embedding, and the margin vector turns out to capture the variation that matters for the editorial question (“where else does this pattern show up?”) well enough.

7. The headline + narrative generation

Every place page carries a generated headline and a multi-paragraph narrative summary. The implementation lives in lib/headline.ts and lib/narrative.ts.

Both modules are deterministic templates: same place data in, same text out. No LLM is in the runtime path; nothing is generated at request time. The templates are conditioned on the most recent presidential margin, the demographic snapshot, and the similar-counties result.

The 40-character floor. Where an editor-curated subhead exists in the editorial layer and is at least 40 characters long, it overrides the templated subhead. Below the floor, we fall back to the template. This lets editorial copy ship one place at a time without blocking the bulk render.

8. Editorial copy

Three tiers of editorial provenance, distinguished by a source field on every editorial record.

curated: Written or hand-reviewed by an editor. The lead paragraph of every county page falls in this tier where coverage exists; subheads on the marquee counties (state capitals, major CBSAs, swing counties) are curated.
generated_reviewed: Generated by a template or an LLM-assisted draft, then reviewed by an editor before publication. Used for the non-county tier subheads where we are working through the backlog (state, CBSA, DMA, CD, SLD).
generated: Generated deterministically from the underlying data, no review. Used for the templated paragraphs after the lead, and for the long-tail places where editorial coverage is not yet possible.

No editorial copy is generated at request time. Every string on every page is either committed to the repo (templated) or stored in Neon (curated / reviewed) and read at build time.

9. Updates & versioning

Cadence. The election layer is updated after every federal election cycle (next: November 2028). The demographic layer is updated annually as the Census Bureau releases each new ACS 5-year file (typically December). The religion layer is updated decennially with each new ASARB Religion Census release.

Data freshness contract. Every build emits a machine-readable data_freshness.json with the as-of date for each source layer. The sitemap’s lastmod field on each place page derives from the most recent source-layer update touching that place.

10. Citation

Cite Akashic by the canonical URL of the page, not the backing JSON. Recommended citation forms:

Plain text.

Akashic Intelligence. (2026). Akashic: {Place Name}, {State}.
  Retrieved {YYYY-MM-DD} from {canonical URL}.

BibTeX.

@misc{akashic-place,
  author       = {Akashic Intelligence},
  title        = {Akashic: {Place Name}, {State}},
  year         = {2026},
  url          = {https://akashic.app/county/{FIPS}/},
  note         = {Accessed {YYYY-MM-DD}}
}

For the underlying source data, cite the original source (MIT Election Lab, ICPSR, US Census Bureau, ASARB) directly; Akashic is the compilation, not the primary source. See about and /ATTRIBUTION.txt for the per-source breakdown.

License

Original editorial copy, computed derived data, and the bulk dataset releases are published under CC BY 4.0. Underlying sources keep their own licenses — see /ATTRIBUTION.txt for the per-source breakdown and /LICENSE.txt for the original-content terms. AI training and indexing are explicitly welcomed (/robots.txt, /llms.txt).