Browse
Domain package for browsing and retrieving full page transcriptions from historical documents.
Models
models
Data models for Riksarkivet browse operations.
Models for browsing document pages with full text and metadata.
BrowseResult
Bases: BaseModel
Result from browsing document pages.
Contains page contexts, manifest ID, and optional OAI-PMH metadata.
PageContext
Bases: BaseModel
Full page context for browsing.
Contains transcribed text, ALTO XML URL, and image URLs for a single page.
Operations
BrowseOperations(http_client)
Browse operations for Riksarkivet document collections.
Provides document browsing functionality for viewing specific pages of documents by reference code.
Attributes:
| Name | Type | Description |
|---|---|---|
alto_client |
Client for fetching ALTO XML content. |
|
oai_client |
Client for OAI-PMH metadata operations. |
|
iiif_client |
Client for interacting with IIIF collections and manifests. |
Source code in packages/browse-lib/src/ra_mcp_browse_lib/browse_operations.py
38 39 40 41 | |
browse_document(reference_code, pages, highlight_term=None, max_pages=20, research_context=None, session_id=None)
async
Browse specific pages of a document.
Retrieves full transcribed content for specified pages of a document, with optional term highlighting. Supports various page specifications including ranges (1-5), lists (1,3,5), and combinations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reference_code
|
str
|
Document identifier (e.g., 'SE/RA/730128/730128.006'). |
required |
pages
|
str
|
Page specification (e.g., '1-3,5,7-9' or 'all'). |
required |
highlight_term
|
str | None
|
Optional term to highlight in the returned text. |
None
|
max_pages
|
int
|
Maximum number of pages to retrieve. |
20
|
research_context
|
str | None
|
User's research goal (recorded as span attribute for telemetry). |
None
|
Returns:
| Type | Description |
|---|---|
BrowseResult
|
BrowseResult containing page contexts, document metadata, |
BrowseResult
|
and persistent identifiers. Returns empty contexts if document |
BrowseResult
|
not found or no valid pages. |
Source code in packages/browse-lib/src/ra_mcp_browse_lib/browse_operations.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | |
ALTO Client
ALTOClient(http_client)
Client for fetching and parsing ALTO XML files from Riksarkivet.
ALTO (Analyzed Layout and Text Object) is an XML schema for describing the layout and content of physical text resources. This client handles multiple ALTO namespace versions (v2, v3, v4) and extracts structured text layers from historical document scans.
Attributes:
| Name | Type | Description |
|---|---|---|
http_client |
HTTP client instance for making requests to ALTO XML endpoints. |
Example
client = ALTOClient(http_client) layer = client.fetch_content("https://sok.riksarkivet.se/dokument/alto/SE_RA_123.xml") print(layer.full_text) # Full transcribed text from the document
Initialize the ALTO client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
http_client
|
HTTPClient
|
Configured HTTP client for making requests. |
required |
Source code in packages/xml-lib/src/ra_mcp_xml/client.py
43 44 45 46 47 48 49 50 | |
fetch_content(alto_url, timeout=10)
async
Fetch and parse an ALTO XML file into a structured TextLayer.
This method performs the complete workflow: fetches the XML document, parses it, and returns a TextLayer with line-level data (polygons, confidence, ids), handling multiple ALTO namespace versions automatically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alto_url
|
str
|
Direct URL to the ALTO XML document. |
required |
timeout
|
int
|
Request timeout in seconds (default: 10). |
10
|
Returns:
| Type | Description |
|---|---|
TextLayer | None
|
TextLayer with line-level data and full_text, |
TextLayer | None
|
TextLayer with empty full_text if ALTO exists but has no text (blank page), |
TextLayer | None
|
or None if fetching/parsing fails (404, network error, etc.). |
Example
layer = await client.fetch_content("https://sok.riksarkivet.se/dokument/alto/SE_RA_123.xml") layer.full_text 'Anno 1676 den 15 Januarii förekom för Rätten...'
Source code in packages/xml-lib/src/ra_mcp_xml/client.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
IIIF Client
IIIFClient(http_client)
Client for IIIF collections and manifests.
Source code in packages/iiif-lib/src/ra_mcp_iiif_lib/client.py
21 22 | |
get_collection(pid, timeout=30)
async
Get IIIF collection with typed model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pid
|
str
|
Persistent identifier for the collection. |
required |
timeout
|
int
|
Request timeout in seconds. |
30
|
Returns:
| Type | Description |
|---|---|
IIIFCollection | None
|
Parsed collection with manifests, or None on fetch failure. |
Source code in packages/iiif-lib/src/ra_mcp_iiif_lib/client.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |
OAI-PMH Client
OAIPMHClient(http_client, base_url=OAI_BASE_URL)
Client for interacting with OAI-PMH repositories.
Source code in packages/oai-pmh-lib/src/ra_mcp_oai_pmh_lib/client.py
30 31 32 | |
extract_manifest_id(identifier)
async
Extract PID from a record for IIIF access.
Source code in packages/oai-pmh-lib/src/ra_mcp_oai_pmh_lib/client.py
64 65 66 67 68 69 70 71 72 73 74 | |
get_metadata(identifier)
async
Get record metadata as typed OAIPMHMetadata model.
Fetches the OAI-PMH GetRecord response for the given identifier and parses the EAD metadata into a structured model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifier
|
str
|
Record identifier (e.g., "SE/RA/310187/1"). |
required |
Returns:
| Type | Description |
|---|---|
OAIPMHMetadata | None
|
Parsed metadata, or None on fetch/parse failure. |
Source code in packages/oai-pmh-lib/src/ra_mcp_oai_pmh_lib/client.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | |
manifest_id_from_metadata(metadata)
Extract manifest ID from already-fetched metadata (no HTTP call).
Source code in packages/oai-pmh-lib/src/ra_mcp_oai_pmh_lib/client.py
76 77 78 79 80 | |
URL Generator
url_generator
URL generation utilities for Riksarkivet resources.
alto_url(manifest_id, page_number)
Generate ALTO URL from manifest ID and page number.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
manifest_id
|
str
|
Manifest identifier (not PID - should be clean manifest ID) |
required |
page_number
|
str
|
Page number |
required |
Returns:
| Type | Description |
|---|---|
str | None
|
ALTO XML URL or None if cannot generate |
Source code in packages/browse-lib/src/ra_mcp_browse_lib/url_generator.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
bildvisning_url(manifest_id, page_number, search_term=None)
Generate bildvisning URL with optional search highlighting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
manifest_id
|
str
|
Manifest ID |
required |
page_number
|
str
|
Page number |
required |
search_term
|
str | None
|
Optional search term to highlight |
None
|
Returns:
| Type | Description |
|---|---|
str | None
|
Bildvisning URL or None if cannot generate |
Source code in packages/browse-lib/src/ra_mcp_browse_lib/url_generator.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
format_page_number(page_number)
Format page number with proper padding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
page_number
|
str
|
Page number string |
required |
Returns:
| Type | Description |
|---|---|
str
|
Padded page number (5 digits) |
Source code in packages/browse-lib/src/ra_mcp_browse_lib/url_generator.py
27 28 29 30 31 32 33 34 35 36 37 38 39 | |
iiif_image_url(manifest_id, page_number)
Generate IIIF image URL from manifest ID and page number.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
manifest_id
|
str
|
Manifest ID |
required |
page_number
|
str
|
Page number |
required |
Returns:
| Type | Description |
|---|---|
str | None
|
IIIF image URL or None if cannot generate |
Source code in packages/browse-lib/src/ra_mcp_browse_lib/url_generator.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | |
remove_arkis_prefix(manifest_id)
Remove arkis! prefix from manifest ID if present.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
manifest_id
|
str
|
Manifest ID string, potentially with arkis! prefix |
required |
Returns:
| Type | Description |
|---|---|
str
|
Manifest ID without arkis! prefix |
Source code in packages/browse-lib/src/ra_mcp_browse_lib/url_generator.py
14 15 16 17 18 19 20 21 22 23 24 | |