Skip to content

Data Sources & Coverage

ra-mcp connects to several Riksarkivet APIs:

API Endpoint Purpose
Search API data.riksarkivet.se/api/records Full-text search across transcribed documents
ALTO XML sok.riksarkivet.se/dokument/alto Structured page transcriptions with text coordinates
IIIF lbiiif.riksarkivet.se High-resolution document images and collection manifests
OAI-PMH oai-pmh.riksarkivet.se/OAI Document metadata and collection structure
Bildvisaren sok.riksarkivet.se/bildvisning Interactive image viewer (links provided in results)

All data comes from the Riksarkivet Data Platform, which hosts AI-transcribed materials from the Swedish National Archives.

Additional resources: Förvaltningshistorik (semantic search, experimental), HTRflow (handwritten text recognition).


Archive Coverage

The archive has three access tiers — not all materials are searchable the same way:

Tier Tool Coverage
Metadata catalog search_metadata 2M+ records — titles, names, places, dates
Digitised images browse_document (links) ~73M pages viewable via bildvisaren
AI-transcribed text search_transcribed ~1.6M pages — currently court records (hovrätt, trolldomskommissionen, poliskammare, magistrat) from 17th-18th centuries

Church records, estate inventories, and military records are typically cataloged and often digitised, but NOT AI-transcribed.

Transcription Quality

The AI-transcribed text was produced by HTR (Handwritten Text Recognition) and OCR models. These transcriptions are not perfect — they contain recognition errors including misread characters, merged or split words, and garbled passages, especially in older or damaged documents.

This has a direct impact on search: an exact search for Stockholm will miss documents where the transcription reads Stockholn or Stookholm due to recognition errors. Always use fuzzy search (~) to compensate — stockholm~1 catches common misreads and significantly increases the number of hits.

The Plugin Model

ra-mcp is one piece of a larger ecosystem. Multiple MCP servers can be connected to the same AI client:

graph LR
  client["AI Client\n(Claude)"]
  client --> ramcp["ra-mcp\nSearch, browse, HTR, viewer, guides"]
  client --> htrflow["htrflow-mcp\nStandalone HTR\n(alternative)"]
  client --> other["other servers\nAny MCP-compatible tool"]

Together with external tools, they enable a complete research workflow: search the archives, read transcriptions, re-transcribe pages that need better OCR/HTR, and view original documents — all from within a single AI conversation.