Search

Domain package for searching transcribed documents via the Riksarkivet Search API.

Models

`models`

Data models for Riksarkivet search operations.

Models are designed to closely match the Search API JSON structure.

`DocumentLinks`

Bases: BaseModel

Links from API _links field.

`GenericReference`

Bases: BaseModel

Generic reference (archival institution, hierarchy level, provenance).

Maps to the GenericReference type in the Swagger spec. All fields are nullable.

`Metadata`

Bases: BaseModel

Document metadata from API.

`PageInfo`

Bases: BaseModel

Page information from snippet.

Note: id is marked nullable in the Swagger spec but is always present in search results. We default to "" for safety.

`RecordsResponse`

Bases: BaseModel

Search API response matching /api/records endpoint.

Maps to the full API response structure from /api/records.

`count_snippets()`

Count total snippets across all records.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py

def count_snippets(self) -> int:
    """Count total snippets across all records."""
    return sum(record.get_snippet_count() for record in self.items)

`SearchRecord`

Bases: BaseModel

Record from search API matching JSON structure (objectType: Record, type: Volume).

`get_collection_url()`

Get collection URL for this document.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py

def get_collection_url(self) -> str:
    """Get collection URL for this document."""
    return f"https://lbiiif.riksarkivet.se/arkis/{self.id}"

`get_manifest_url()`

Get manifest URL from links.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py

def get_manifest_url(self) -> str | None:
    """Get manifest URL from links."""
    if self.links and self.links.image:
        return self.links.image[0] if self.links.image else None
    return None

`get_snippet_count()`

Get number of snippets returned for this document.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py

def get_snippet_count(self) -> int:
    """Get number of snippets returned for this document."""
    return len(self.transcribed_text.snippets) if self.transcribed_text else 0

`get_title()`

Get document title or default.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py

def get_title(self) -> str:
    """Get document title or default."""
    return self.caption or "(No title)"

`get_total_hits()`

Get total number of hits in this document.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py

def get_total_hits(self) -> int:
    """Get total number of hits in this document."""
    return self.transcribed_text.num_total if self.transcribed_text else 0

`SearchResult`

Bases: BaseModel

Search result with query context.

Wraps RecordsResponse with the search parameters used to execute the search. Parameter names match the Search API specification.

`items` `property`

Get records from response.

`keyword` `property`

Alias for transcribed_text for backward compatibility.

`total_hits` `property`

Get total hits from response.

`count_snippets()`

Count total snippets across all records.

Source code in packages/search-lib/src/ra_mcp_search_lib/models.py

def count_snippets(self) -> int:
    """Count total snippets across all records."""
    return self.response.count_snippets()

`Snippet`

Bases: BaseModel

Text snippet with search match.

Note: text and pages are marked nullable in the Swagger spec but are always present in search results. We use safe defaults.

`TranscribedText`

Bases: BaseModel

Transcribed text information.

Note: snippets is marked nullable in the Swagger spec but is always present (possibly empty) in search results.

Client

`SearchClient(http_client)`

Client for Riksarkivet Search API.

Handles searching transcribed documents and returns structured responses that directly map to the API's JSON structure.

Source code in packages/search-lib/src/ra_mcp_search_lib/search_client.py

def __init__(self, http_client: HTTPClient):
    self.http_client = http_client
    self.logger = logging.getLogger("ra_mcp.search.client")

`search(text=None, transcribed_text=None, only_digitised_materials=True, limit=DEFAULT_LIMIT, offset=0, max_snippets_per_record=None, sort='relevance', year_min=None, year_max=None, name=None, place=None)` `async`

Search for records using various search parameters.

Parameter names match the Search API specification for clarity. You can search either transcribed materials or general text (metadata, names, places).

Parameters:

Name	Type	Description	Default
`text`	`str \| None`	General free-text search across all fields	`None`
`transcribed_text`	`str \| None`	Search specifically in AI-transcribed text (requires only_digitised_materials=True)	`None`
`only_digitised_materials`	`bool`	Limit results to digitised materials (default: True, API default: False)	`True`
`limit`	`int`	Maximum number of records to return	`DEFAULT_LIMIT`
`offset`	`int`	Pagination offset (API parameter name)	`0`
`max_snippets_per_record`	`int \| None`	Client-side snippet limiting per record (not sent to API)	`None`
`sort`	`str`	Sort order — one of: relevance, timeAsc, timeDesc, alphaAsc, alphaDesc	`'relevance'`
`year_min`	`int \| None`	Filter results to this start year or later	`None`
`year_max`	`int \| None`	Filter results to this end year or earlier	`None`
`name`	`str \| None`	Search by person name (API field, can combine with text)	`None`
`place`	`str \| None`	Search by place name (API field, can combine with text)	`None`

Returns:

Type	Description
`RecordsResponse`	RecordsResponse with all API fields populated

Raises:

Type	Description
`ValueError`	If transcribed_text is used without only_digitised_materials=True

Source code in packages/search-lib/src/ra_mcp_search_lib/search_client.py

async def search(
    self,
    text: str | None = None,
    transcribed_text: str | None = None,
    only_digitised_materials: bool = True,
    limit: int = DEFAULT_LIMIT,
    offset: int = 0,
    max_snippets_per_record: int | None = None,
    sort: str = "relevance",
    year_min: int | None = None,
    year_max: int | None = None,
    name: str | None = None,
    place: str | None = None,
) -> RecordsResponse:
    """
    Search for records using various search parameters.

    Parameter names match the Search API specification for clarity.
    You can search either transcribed materials or general text (metadata, names, places).

    Args:
        text: General free-text search across all fields
        transcribed_text: Search specifically in AI-transcribed text (requires only_digitised_materials=True)
        only_digitised_materials: Limit results to digitised materials (default: True, API default: False)
        limit: Maximum number of records to return
        offset: Pagination offset (API parameter name)
        max_snippets_per_record: Client-side snippet limiting per record (not sent to API)
        sort: Sort order — one of: relevance, timeAsc, timeDesc, alphaAsc, alphaDesc
        year_min: Filter results to this start year or later
        year_max: Filter results to this end year or earlier
        name: Search by person name (API field, can combine with text)
        place: Search by place name (API field, can combine with text)

    Returns:
        RecordsResponse with all API fields populated

    Raises:
        ValueError: If transcribed_text is used without only_digitised_materials=True
    """
    # Validate parameters
    if transcribed_text and not only_digitised_materials:
        raise ValueError(
            "transcribed_text search requires only_digitised_materials=True. Use text parameter instead of transcribed_text to search all materials."
        )

    if not text and not transcribed_text and not name and not place:
        raise ValueError("Must provide at least one search parameter: 'text', 'transcribed_text', 'name', or 'place'")

    search_term = transcribed_text or text
    search_type = "transcribed" if transcribed_text else "general"
    self.logger.info("Starting %s search: keyword='%s', limit=%d, offset=%d", search_type, search_term, limit, offset)

    with _tracer.start_as_current_span(
        "SearchClient.search",
        attributes={
            "search.type": search_type,
            "search.keyword": search_term or "",
            "search.offset": offset,
            "search.limit": limit,
        },
    ) as span:
        try:
            # Build search parameters
            params = {
                "limit": limit,
                "offset": offset,
                "sort": sort,
            }

            if only_digitised_materials:
                params["only_digitised_materials"] = "true"

            if transcribed_text:
                params["transcribed_text"] = transcribed_text
            elif text:
                params["text"] = text

            if name:
                params["name"] = name
            if place:
                params["place"] = place
            if year_min is not None:
                params["year_min"] = year_min
            if year_max is not None:
                params["year_max"] = year_max

            response_data = await self.http_client.get_json(SEARCH_API_BASE_URL, params=params, timeout=REQUEST_TIMEOUT)

            # Parse entire API response with Pydantic
            response = RecordsResponse(**response_data)  # fields provided via **kwargs

            # Apply client-side snippet limiting if requested
            if max_snippets_per_record:
                self._limit_snippets(response, max_snippets_per_record)

            self.logger.info(
                "✓ Search completed: %d snippets from %d records (%d total)", response.count_snippets(), len(response.items), response.total_hits
            )

            span.set_attribute("search.total_hits", response.total_hits)
            span.set_attribute("search.result_count", len(response.items))
            return response

        except Exception as error:
            span.set_status(StatusCode.ERROR, str(error))
            span.record_exception(error)
            self.logger.error("✗ Search failed: %s: %s", type(error).__name__, error)
            raise

`search_transcribed_text(transcribed_text, limit=DEFAULT_LIMIT, offset=0, max_snippets_per_record=None)` `async`

Search for keyword in transcribed materials (convenience method).

This is a convenience wrapper around the main search() method.

Parameters:

Name	Type	Description	Default
`transcribed_text`	`str`	Search term or Solr query (API parameter name)	required
`limit`	`int`	Maximum number of records to return	`DEFAULT_LIMIT`
`offset`	`int`	Pagination offset (API parameter name)	`0`
`max_snippets_per_record`	`int \| None`	Client-side snippet limiting per record (not sent to API)	`None`

Returns:

Type	Description
`RecordsResponse`	RecordsResponse with all API fields populated

Source code in packages/search-lib/src/ra_mcp_search_lib/search_client.py

async def search_transcribed_text(
    self,
    transcribed_text: str,
    limit: int = DEFAULT_LIMIT,
    offset: int = 0,
    max_snippets_per_record: int | None = None,
) -> RecordsResponse:
    """
    Search for keyword in transcribed materials (convenience method).

    This is a convenience wrapper around the main search() method.

    Args:
        transcribed_text: Search term or Solr query (API parameter name)
        limit: Maximum number of records to return
        offset: Pagination offset (API parameter name)
        max_snippets_per_record: Client-side snippet limiting per record (not sent to API)

    Returns:
        RecordsResponse with all API fields populated
    """
    return await self.search(
        transcribed_text=transcribed_text,
        only_digitised_materials=True,
        limit=limit,
        offset=offset,
        max_snippets_per_record=max_snippets_per_record,
    )

Operations

`SearchOperations(http_client)`

Search operations for Riksarkivet document collections.

Provides keyword search functionality.

Attributes:

Name	Type	Description
`search_api`		Client for executing text searches.

Source code in packages/search-lib/src/ra_mcp_search_lib/search_operations.py

def __init__(self, http_client: HTTPClient):
    self.search_api = SearchClient(http_client=http_client)

`search(keyword, transcribed_only=True, only_digitised=True, offset=0, limit=10, max_snippets_per_record=None, sort='relevance', year_min=None, year_max=None, name=None, place=None, research_context=None, session_id=None)` `async`

Search for records in document collections.

Executes a keyword search across Riksarkivet collections. Can search either transcribed text specifically or general metadata (titles, names, places, etc.).

Parameters:

Name	Type	Description	Default
`keyword`	`str`	Search term or phrase to look for.	required
`transcribed_only`	`bool`	If True, search in transcribed text only. If False, search all fields.	`True`
`only_digitised`	`bool`	Limit results to digitized materials (default: True).	`True`
`offset`	`int`	Number of results to skip for pagination.	`0`
`limit`	`int`	Maximum number of documents to return.	`10`
`max_snippets_per_record`	`int \| None`	Limit snippets per document (None for unlimited).	`None`
`sort`	`str`	Sort order — one of: relevance, timeAsc, timeDesc, alphaAsc, alphaDesc.	`'relevance'`
`year_min`	`int \| None`	Filter results to this start year or later.	`None`
`year_max`	`int \| None`	Filter results to this end year or earlier.	`None`
`name`	`str \| None`	Search by person name (can combine with keyword).	`None`
`place`	`str \| None`	Search by place name (can combine with keyword).	`None`
`research_context`	`str \| None`	User's research goal (recorded as span attribute for telemetry).	`None`

Returns:

Type	Description
`SearchResult`	SearchResult containing documents, total count, and metadata.

Source code in packages/search-lib/src/ra_mcp_search_lib/search_operations.py

async def search(
    self,
    keyword: str,
    transcribed_only: bool = True,
    only_digitised: bool = True,
    offset: int = 0,
    limit: int = 10,
    max_snippets_per_record: int | None = None,
    sort: str = "relevance",
    year_min: int | None = None,
    year_max: int | None = None,
    name: str | None = None,
    place: str | None = None,
    research_context: str | None = None,
    session_id: str | None = None,
) -> SearchResult:
    """Search for records in document collections.

    Executes a keyword search across Riksarkivet collections. Can search either
    transcribed text specifically or general metadata (titles, names, places, etc.).

    Args:
        keyword: Search term or phrase to look for.
        transcribed_only: If True, search in transcribed text only. If False, search all fields.
        only_digitised: Limit results to digitized materials (default: True).
        offset: Number of results to skip for pagination.
        limit: Maximum number of documents to return.
        max_snippets_per_record: Limit snippets per document (None for unlimited).
        sort: Sort order — one of: relevance, timeAsc, timeDesc, alphaAsc, alphaDesc.
        year_min: Filter results to this start year or later.
        year_max: Filter results to this end year or earlier.
        name: Search by person name (can combine with keyword).
        place: Search by place name (can combine with keyword).
        research_context: User's research goal (recorded as span attribute for telemetry).

    Returns:
        SearchResult containing documents, total count, and metadata.
    """
    search_type = "transcribed" if transcribed_only else "metadata"
    with _tracer.start_as_current_span(
        "SearchOperations.search",
        attributes={
            "search.keyword": keyword,
            "search.transcribed_only": transcribed_only,
            "search.offset": offset,
            "search.limit": limit,
            **({"search.research_context": research_context} if research_context else {}),
            **({"mcp.session.id": session_id} if session_id else {}),
        },
    ) as span:
        try:
            # Execute search using API parameter names
            response = await self.search_api.search(
                transcribed_text=keyword if transcribed_only else None,
                text=keyword if not transcribed_only else None,
                only_digitised_materials=only_digitised,
                limit=limit,
                offset=offset,
                max_snippets_per_record=max_snippets_per_record,
                sort=sort,
                year_min=year_min,
                year_max=year_max,
                name=name,
                place=place,
            )

            span.set_attribute("search.total_hits", response.total_hits)
            _search_counter.add(1, {"search.type": search_type, "search.status": "success"})
            _results_histogram.record(response.total_hits, {"search.type": search_type})
            return SearchResult(response=response, transcribed_text=keyword, limit=limit, offset=offset, max_snippets_per_record=max_snippets_per_record)
        except Exception as e:
            span.set_status(StatusCode.ERROR, str(e))
            span.record_exception(e)
            _search_counter.add(1, {"search.type": search_type, "search.status": "error"})
            raise

`search_transcribed(keyword, offset=0, limit=10, max_snippets_per_record=None)` `async`

Search for transcribed text across document collections (convenience method).

Executes a keyword search across all transcribed documents in the Riksarkivet collections. This is a convenience wrapper around the main search() method.

Parameters:

Name	Type	Description	Default
`keyword`	`str`	Search term or phrase to look for in transcribed text.	required
`offset`	`int`	Number of results to skip for pagination.	`0`
`limit`	`int`	Maximum number of documents to return.	`10`
`max_snippets_per_record`	`int \| None`	Limit snippets per document (None for unlimited).	`None`

Returns:

Type	Description
`SearchResult`	SearchResult containing documents, total count, and metadata.

Source code in packages/search-lib/src/ra_mcp_search_lib/search_operations.py

async def search_transcribed(
    self,
    keyword: str,
    offset: int = 0,
    limit: int = 10,
    max_snippets_per_record: int | None = None,
) -> SearchResult:
    """Search for transcribed text across document collections (convenience method).

    Executes a keyword search across all transcribed documents in the Riksarkivet
    collections. This is a convenience wrapper around the main search() method.

    Args:
        keyword: Search term or phrase to look for in transcribed text.
        offset: Number of results to skip for pagination.
        limit: Maximum number of documents to return.
        max_snippets_per_record: Limit snippets per document (None for unlimited).

    Returns:
        SearchResult containing documents, total count, and metadata.
    """
    return await self.search(
        keyword=keyword, transcribed_only=True, only_digitised=True, offset=offset, limit=limit, max_snippets_per_record=max_snippets_per_record
    )

Search

Models

models

DocumentLinks

GenericReference

Metadata

PageInfo

RecordsResponse

count_snippets()

SearchRecord

get_collection_url()

get_manifest_url()

get_snippet_count()

get_title()

get_total_hits()

SearchResult

items property

keyword property

total_hits property

count_snippets()

Snippet

TranscribedText

Client

SearchClient(http_client)

search(text=None, transcribed_text=None, only_digitised_materials=True, limit=DEFAULT_LIMIT, offset=0, max_snippets_per_record=None, sort='relevance', year_min=None, year_max=None, name=None, place=None) async

search_transcribed_text(transcribed_text, limit=DEFAULT_LIMIT, offset=0, max_snippets_per_record=None) async

Operations

SearchOperations(http_client)

search(keyword, transcribed_only=True, only_digitised=True, offset=0, limit=10, max_snippets_per_record=None, sort='relevance', year_min=None, year_max=None, name=None, place=None, research_context=None, session_id=None) async

search_transcribed(keyword, offset=0, limit=10, max_snippets_per_record=None) async

`models`

`DocumentLinks`

`GenericReference`

`Metadata`

`PageInfo`

`RecordsResponse`

`count_snippets()`

`SearchRecord`

`get_collection_url()`

`get_manifest_url()`

`get_snippet_count()`

`get_title()`

`get_total_hits()`

`SearchResult`

`items` `property`

`keyword` `property`

`total_hits` `property`

`count_snippets()`

`Snippet`

`TranscribedText`

`SearchClient(http_client)`

`search(text=None, transcribed_text=None, only_digitised_materials=True, limit=DEFAULT_LIMIT, offset=0, max_snippets_per_record=None, sort='relevance', year_min=None, year_max=None, name=None, place=None)` `async`

`search_transcribed_text(transcribed_text, limit=DEFAULT_LIMIT, offset=0, max_snippets_per_record=None)` `async`

`SearchOperations(http_client)`

`search(keyword, transcribed_only=True, only_digitised=True, offset=0, limit=10, max_snippets_per_record=None, sort='relevance', year_min=None, year_max=None, name=None, place=None, research_context=None, session_id=None)` `async`

`search_transcribed(keyword, offset=0, limit=10, max_snippets_per_record=None)` `async`