Skip to content

Search

Domain package for searching transcribed documents via the Riksarkivet Search API.

Models

models

Data models for Riksarkivet search operations.

Models are designed to closely match the Search API JSON structure.

Bases: BaseModel

Links from API _links field.

GenericReference

Bases: BaseModel

Generic reference (archival institution, hierarchy level, provenance).

Maps to the GenericReference type in the Swagger spec. All fields are nullable.

Metadata

Bases: BaseModel

Document metadata from API.

PageInfo

Bases: BaseModel

Page information from snippet.

Note: id is marked nullable in the Swagger spec but is always present in search results. We default to "" for safety.

RecordsResponse

Bases: BaseModel

Search API response matching /api/records endpoint.

Maps to the full API response structure from /api/records.

count_snippets()

Count total snippets across all records.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py
155
156
157
def count_snippets(self) -> int:
    """Count total snippets across all records."""
    return sum(record.get_snippet_count() for record in self.items)

SearchRecord

Bases: BaseModel

Record from search API matching JSON structure (objectType: Record, type: Volume).

get_collection_url()

Get collection URL for this document.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py
117
118
119
def get_collection_url(self) -> str:
    """Get collection URL for this document."""
    return f"https://lbiiif.riksarkivet.se/arkis/{self.id}"

get_manifest_url()

Get manifest URL from links.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py
111
112
113
114
115
def get_manifest_url(self) -> str | None:
    """Get manifest URL from links."""
    if self.links and self.links.image:
        return self.links.image[0] if self.links.image else None
    return None

get_snippet_count()

Get number of snippets returned for this document.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py
129
130
131
def get_snippet_count(self) -> int:
    """Get number of snippets returned for this document."""
    return len(self.transcribed_text.snippets) if self.transcribed_text else 0

get_title()

Get document title or default.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py
121
122
123
def get_title(self) -> str:
    """Get document title or default."""
    return self.caption or "(No title)"

get_total_hits()

Get total number of hits in this document.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py
125
126
127
def get_total_hits(self) -> int:
    """Get total number of hits in this document."""
    return self.transcribed_text.num_total if self.transcribed_text else 0

SearchResult

Bases: BaseModel

Search result with query context.

Wraps RecordsResponse with the search parameters used to execute the search. Parameter names match the Search API specification.

items property

Get records from response.

keyword property

Alias for transcribed_text for backward compatibility.

total_hits property

Get total hits from response.

count_snippets()

Count total snippets across all records.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py
189
190
191
def count_snippets(self) -> int:
    """Count total snippets across all records."""
    return self.response.count_snippets()

Snippet

Bases: BaseModel

Text snippet with search match.

Note: text and pages are marked nullable in the Swagger spec but are always present in search results. We use safe defaults.

TranscribedText

Bases: BaseModel

Transcribed text information.

Note: snippets is marked nullable in the Swagger spec but is always present (possibly empty) in search results.

Client

SearchClient(http_client)

Client for Riksarkivet Search API.

Handles searching transcribed documents and returns structured responses that directly map to the API's JSON structure.

Source code in packages/search-lib/src/ra_mcp_search_lib/search_client.py
29
30
31
def __init__(self, http_client: HTTPClient):
    self.http_client = http_client
    self.logger = logging.getLogger("ra_mcp.search.client")

search(text=None, transcribed_text=None, only_digitised_materials=True, limit=DEFAULT_LIMIT, offset=0, max_snippets_per_record=None, sort='relevance', year_min=None, year_max=None, name=None, place=None) async

Search for records using various search parameters.

Parameter names match the Search API specification for clarity. You can search either transcribed materials or general text (metadata, names, places).

Parameters:

Name Type Description Default
text str | None

General free-text search across all fields

None
transcribed_text str | None

Search specifically in AI-transcribed text (requires only_digitised_materials=True)

None
only_digitised_materials bool

Limit results to digitised materials (default: True, API default: False)

True
limit int

Maximum number of records to return

DEFAULT_LIMIT
offset int

Pagination offset (API parameter name)

0
max_snippets_per_record int | None

Client-side snippet limiting per record (not sent to API)

None
sort str

Sort order — one of: relevance, timeAsc, timeDesc, alphaAsc, alphaDesc

'relevance'
year_min int | None

Filter results to this start year or later

None
year_max int | None

Filter results to this end year or earlier

None
name str | None

Search by person name (API field, can combine with text)

None
place str | None

Search by place name (API field, can combine with text)

None

Returns:

Type Description
RecordsResponse

RecordsResponse with all API fields populated

Raises:

Type Description
ValueError

If transcribed_text is used without only_digitised_materials=True

Source code in packages/search-lib/src/ra_mcp_search_lib/search_client.py
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
async def search(
    self,
    text: str | None = None,
    transcribed_text: str | None = None,
    only_digitised_materials: bool = True,
    limit: int = DEFAULT_LIMIT,
    offset: int = 0,
    max_snippets_per_record: int | None = None,
    sort: str = "relevance",
    year_min: int | None = None,
    year_max: int | None = None,
    name: str | None = None,
    place: str | None = None,
) -> RecordsResponse:
    """
    Search for records using various search parameters.

    Parameter names match the Search API specification for clarity.
    You can search either transcribed materials or general text (metadata, names, places).

    Args:
        text: General free-text search across all fields
        transcribed_text: Search specifically in AI-transcribed text (requires only_digitised_materials=True)
        only_digitised_materials: Limit results to digitised materials (default: True, API default: False)
        limit: Maximum number of records to return
        offset: Pagination offset (API parameter name)
        max_snippets_per_record: Client-side snippet limiting per record (not sent to API)
        sort: Sort order — one of: relevance, timeAsc, timeDesc, alphaAsc, alphaDesc
        year_min: Filter results to this start year or later
        year_max: Filter results to this end year or earlier
        name: Search by person name (API field, can combine with text)
        place: Search by place name (API field, can combine with text)

    Returns:
        RecordsResponse with all API fields populated

    Raises:
        ValueError: If transcribed_text is used without only_digitised_materials=True
    """
    # Validate parameters
    if transcribed_text and not only_digitised_materials:
        raise ValueError(
            "transcribed_text search requires only_digitised_materials=True. Use text parameter instead of transcribed_text to search all materials."
        )

    if not text and not transcribed_text and not name and not place:
        raise ValueError("Must provide at least one search parameter: 'text', 'transcribed_text', 'name', or 'place'")

    search_term = transcribed_text or text
    search_type = "transcribed" if transcribed_text else "general"
    self.logger.info("Starting %s search: keyword='%s', limit=%d, offset=%d", search_type, search_term, limit, offset)

    with _tracer.start_as_current_span(
        "SearchClient.search",
        attributes={
            "search.type": search_type,
            "search.keyword": search_term or "",
            "search.offset": offset,
            "search.limit": limit,
        },
    ) as span:
        try:
            # Build search parameters
            params = {
                "limit": limit,
                "offset": offset,
                "sort": sort,
            }

            if only_digitised_materials:
                params["only_digitised_materials"] = "true"

            if transcribed_text:
                params["transcribed_text"] = transcribed_text
            elif text:
                params["text"] = text

            if name:
                params["name"] = name
            if place:
                params["place"] = place
            if year_min is not None:
                params["year_min"] = year_min
            if year_max is not None:
                params["year_max"] = year_max

            response_data = await self.http_client.get_json(SEARCH_API_BASE_URL, params=params, timeout=REQUEST_TIMEOUT)

            # Parse entire API response with Pydantic
            response = RecordsResponse(**response_data)  # fields provided via **kwargs

            # Apply client-side snippet limiting if requested
            if max_snippets_per_record:
                self._limit_snippets(response, max_snippets_per_record)

            self.logger.info(
                "✓ Search completed: %d snippets from %d records (%d total)", response.count_snippets(), len(response.items), response.total_hits
            )

            span.set_attribute("search.total_hits", response.total_hits)
            span.set_attribute("search.result_count", len(response.items))
            return response

        except Exception as error:
            span.set_status(StatusCode.ERROR, str(error))
            span.record_exception(error)
            self.logger.error("✗ Search failed: %s: %s", type(error).__name__, error)
            raise

search_transcribed_text(transcribed_text, limit=DEFAULT_LIMIT, offset=0, max_snippets_per_record=None) async

Search for keyword in transcribed materials (convenience method).

This is a convenience wrapper around the main search() method.

Parameters:

Name Type Description Default
transcribed_text str

Search term or Solr query (API parameter name)

required
limit int

Maximum number of records to return

DEFAULT_LIMIT
offset int

Pagination offset (API parameter name)

0
max_snippets_per_record int | None

Client-side snippet limiting per record (not sent to API)

None

Returns:

Type Description
RecordsResponse

RecordsResponse with all API fields populated

Source code in packages/search-lib/src/ra_mcp_search_lib/search_client.py
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
async def search_transcribed_text(
    self,
    transcribed_text: str,
    limit: int = DEFAULT_LIMIT,
    offset: int = 0,
    max_snippets_per_record: int | None = None,
) -> RecordsResponse:
    """
    Search for keyword in transcribed materials (convenience method).

    This is a convenience wrapper around the main search() method.

    Args:
        transcribed_text: Search term or Solr query (API parameter name)
        limit: Maximum number of records to return
        offset: Pagination offset (API parameter name)
        max_snippets_per_record: Client-side snippet limiting per record (not sent to API)

    Returns:
        RecordsResponse with all API fields populated
    """
    return await self.search(
        transcribed_text=transcribed_text,
        only_digitised_materials=True,
        limit=limit,
        offset=offset,
        max_snippets_per_record=max_snippets_per_record,
    )

Operations

SearchOperations(http_client)

Search operations for Riksarkivet document collections.

Provides keyword search functionality.

Attributes:

Name Type Description
search_api

Client for executing text searches.

Source code in packages/search-lib/src/ra_mcp_search_lib/search_operations.py
30
31
def __init__(self, http_client: HTTPClient):
    self.search_api = SearchClient(http_client=http_client)

search(keyword, transcribed_only=True, only_digitised=True, offset=0, limit=10, max_snippets_per_record=None, sort='relevance', year_min=None, year_max=None, name=None, place=None, research_context=None, session_id=None) async

Search for records in document collections.

Executes a keyword search across Riksarkivet collections. Can search either transcribed text specifically or general metadata (titles, names, places, etc.).

Parameters:

Name Type Description Default
keyword str

Search term or phrase to look for.

required
transcribed_only bool

If True, search in transcribed text only. If False, search all fields.

True
only_digitised bool

Limit results to digitized materials (default: True).

True
offset int

Number of results to skip for pagination.

0
limit int

Maximum number of documents to return.

10
max_snippets_per_record int | None

Limit snippets per document (None for unlimited).

None
sort str

Sort order — one of: relevance, timeAsc, timeDesc, alphaAsc, alphaDesc.

'relevance'
year_min int | None

Filter results to this start year or later.

None
year_max int | None

Filter results to this end year or earlier.

None
name str | None

Search by person name (can combine with keyword).

None
place str | None

Search by place name (can combine with keyword).

None
research_context str | None

User's research goal (recorded as span attribute for telemetry).

None

Returns:

Type Description
SearchResult

SearchResult containing documents, total count, and metadata.

Source code in packages/search-lib/src/ra_mcp_search_lib/search_operations.py
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
async def search(
    self,
    keyword: str,
    transcribed_only: bool = True,
    only_digitised: bool = True,
    offset: int = 0,
    limit: int = 10,
    max_snippets_per_record: int | None = None,
    sort: str = "relevance",
    year_min: int | None = None,
    year_max: int | None = None,
    name: str | None = None,
    place: str | None = None,
    research_context: str | None = None,
    session_id: str | None = None,
) -> SearchResult:
    """Search for records in document collections.

    Executes a keyword search across Riksarkivet collections. Can search either
    transcribed text specifically or general metadata (titles, names, places, etc.).

    Args:
        keyword: Search term or phrase to look for.
        transcribed_only: If True, search in transcribed text only. If False, search all fields.
        only_digitised: Limit results to digitized materials (default: True).
        offset: Number of results to skip for pagination.
        limit: Maximum number of documents to return.
        max_snippets_per_record: Limit snippets per document (None for unlimited).
        sort: Sort order — one of: relevance, timeAsc, timeDesc, alphaAsc, alphaDesc.
        year_min: Filter results to this start year or later.
        year_max: Filter results to this end year or earlier.
        name: Search by person name (can combine with keyword).
        place: Search by place name (can combine with keyword).
        research_context: User's research goal (recorded as span attribute for telemetry).

    Returns:
        SearchResult containing documents, total count, and metadata.
    """
    search_type = "transcribed" if transcribed_only else "metadata"
    with _tracer.start_as_current_span(
        "SearchOperations.search",
        attributes={
            "search.keyword": keyword,
            "search.transcribed_only": transcribed_only,
            "search.offset": offset,
            "search.limit": limit,
            **({"search.research_context": research_context} if research_context else {}),
            **({"mcp.session.id": session_id} if session_id else {}),
        },
    ) as span:
        try:
            # Execute search using API parameter names
            response = await self.search_api.search(
                transcribed_text=keyword if transcribed_only else None,
                text=keyword if not transcribed_only else None,
                only_digitised_materials=only_digitised,
                limit=limit,
                offset=offset,
                max_snippets_per_record=max_snippets_per_record,
                sort=sort,
                year_min=year_min,
                year_max=year_max,
                name=name,
                place=place,
            )

            span.set_attribute("search.total_hits", response.total_hits)
            _search_counter.add(1, {"search.type": search_type, "search.status": "success"})
            _results_histogram.record(response.total_hits, {"search.type": search_type})
            return SearchResult(response=response, transcribed_text=keyword, limit=limit, offset=offset, max_snippets_per_record=max_snippets_per_record)
        except Exception as e:
            span.set_status(StatusCode.ERROR, str(e))
            span.record_exception(e)
            _search_counter.add(1, {"search.type": search_type, "search.status": "error"})
            raise

search_transcribed(keyword, offset=0, limit=10, max_snippets_per_record=None) async

Search for transcribed text across document collections (convenience method).

Executes a keyword search across all transcribed documents in the Riksarkivet collections. This is a convenience wrapper around the main search() method.

Parameters:

Name Type Description Default
keyword str

Search term or phrase to look for in transcribed text.

required
offset int

Number of results to skip for pagination.

0
limit int

Maximum number of documents to return.

10
max_snippets_per_record int | None

Limit snippets per document (None for unlimited).

None

Returns:

Type Description
SearchResult

SearchResult containing documents, total count, and metadata.

Source code in packages/search-lib/src/ra_mcp_search_lib/search_operations.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
async def search_transcribed(
    self,
    keyword: str,
    offset: int = 0,
    limit: int = 10,
    max_snippets_per_record: int | None = None,
) -> SearchResult:
    """Search for transcribed text across document collections (convenience method).

    Executes a keyword search across all transcribed documents in the Riksarkivet
    collections. This is a convenience wrapper around the main search() method.

    Args:
        keyword: Search term or phrase to look for in transcribed text.
        offset: Number of results to skip for pagination.
        limit: Maximum number of documents to return.
        max_snippets_per_record: Limit snippets per document (None for unlimited).

    Returns:
        SearchResult containing documents, total count, and metadata.
    """
    return await self.search(
        keyword=keyword, transcribed_only=True, only_digitised=True, offset=offset, limit=limit, max_snippets_per_record=max_snippets_per_record
    )