Skip to content

Enabling Full Text Search in The National Archives

This project focuses on integrating advanced search functionality within The National Archives' online interface to maximize the potential of transcribed texts derived from Htrflow. Currently, users can only access archival documents by searching metadata. If a search term isn't in the metadata, no results are returned, even if the term appears in the document.

By utilizing Htrflow to transcribe a large volume of digitized archival documents, The National Archives aims to enable direct searching within document texts via the online interface. This enhancement will significantly improve accessibility and usability of archival information for researchers and the general public.

The project involves indexing Alto XML files generated by Htrflow into Solr, enabling queries via a REST API to locate words within archival documents. This indexed data will power the new document search feature within the online interface, facilitating more comprehensive access to archival content