Models¶
Supported Models¶
HTRflow natively supports specific models from the following frameworks: Ultralytics, Hugging Face, and OpenMMLab.
Tip
For a complete list of predefined models compatible with our Pipeline steps, see the Model reference.
Riksarkivet Models¶
Riksarkivet provides several ready-to-use models available on Hugging Face.
Here are some of the different materials the models were trained on:
OpenMMLab Models¶
To use OpenMMLab models (e.g SATRN), specific dependencies need to be installed, including torch
, mmcv
, mmdet
, mmengine
, mmocr
, and yapf
. Follow the instructions below to ensure the correct versions are installed.
Note
OpenMMLab requires a specific PyTorch version. Make sure you have pytorch==2.0.0
installed:
You can install the OpenMMLab dependencies using either mim
or pip
.
The recommended method, according to OpenMMLab, is to use mim
, which is a package and model manager.
First, install mim
:
Then, use mim
to install the required packages:
Here are links to the documentation for each OpenMMLab package used in HTRflow:
Custom Models¶
If your model (or framework) is not supported, you can implement a custom model in HTRflow. Below is a basic example:
class Model(BaseModel):
def __init__(self, *args, **kwargs):
# Initialize your model here
pass
def _predict(self, images, **kwargs) -> list[Result]:
# Run inference on `images`
# Should return, for example, Result.text_recognition_result()
# or Result.segmentation_result()
See Result reference on different types of return formats from the models. For instance, Result.text_recognition_result()
for HTR or Result.segmentation_result()
for segmenetation or object detection.
Examples of Custom Implementations¶
Text Recognition Model:
class RecognitionModel(BaseModel):
def _predict(self, images: list[np.ndarray]) -> list[Result]:
metadata = {"model": "Lorem dummy model"}
n = 2
return [
Result.text_recognition_result(
metadata,
texts=[lorem.sentence() for _ in range(n)],
scores=[random.random() for _ in range(n)],
)
for _ in images
]
Document Classification Model:
class ClassificationModel(BaseModel):
"""Model that classifies input images into different types of potato dishes."""
def _predict(self, images: list[np.ndarray]) -> list[Result]:
classes = ["baked potato", "french fry", "raggmunk"]
return [
Result(metadata={"model": "Potato classifier 2000"}, data=[{"classification": random.choice(classes)}])
for _ in images
]