Models¶
Supported Models¶
HTRflow natively supports specific models from the following frameworks: Ultralytics, Hugging Face, and OpenMMLab.
Tip
For a complete list of predefined models compatible with our Pipeline steps, see the Model reference.
Riksarkivet Models¶
Riksarkivet provides several ready-to-use models available on Hugging Face.
OpenMMLab Models¶
To use OpenMMLab models (e.g SATRN), specific dependencies need to be installed, including torch, mmcv, mmdet, mmengine, mmocr, and yapf. Follow the instructions below to ensure the correct versions are installed.
Note
OpenMMLab requires a specific PyTorch version. Make sure you have pytorch==2.0.0 installed:
You can install the OpenMMLab dependencies using either mim or pip.
The recommended method, according to OpenMMLab, is to use mim, which is a package and model manager.
First, install mim:
Then, use mim to install the required packages:
Here are links to the documentation for each OpenMMLab package used in HTRflow:
Teklia Models¶
To use models from Teklia (currently only PyLaia), specific dependencies need to be installed, including pylaia. Follow the instructions below to ensure the correct versions are installed.
Note
Pylaia requires a specific PyTorch version. Make sure you have pytorch==1.13.0 installed:
Note
Pylaia requires a specific Python version. Make sure you have python=<3.10
Link to the documentation for PyLaia from Teklia:
Custom Models¶
If your model (or framework) is not supported, you can implement a custom model in HTRflow. Below is a basic example:
class Model(BaseModel):
def __init__(self, *args, **kwargs):
# Initialize your model here
pass
def _predict(self, images, **kwargs) -> list[Result]:
# Run inference on `images`
# Should return, for example, Result.text_recognition_result()
# or Result.segmentation_result()
See Result reference on different types of return formats from the models. For instance, Result.text_recognition_result() for HTR or Result.segmentation_result() for segmenetation or object detection.
Examples of Custom Implementations¶
Text Recognition Model:
class RecognitionModel(BaseModel):
def _predict(self, images: list[np.ndarray]) -> list[Result]:
metadata = {"model": "Lorem dummy model"}
n = 2
return [
Result.text_recognition_result(
metadata,
texts=[lorem.sentence() for _ in range(n)],
scores=[random.random() for _ in range(n)],
)
for _ in images
]
Document Classification Model:
class ClassificationModel(BaseModel):
"""Model that classifies input images into different types of potato dishes."""
def _predict(self, images: list[np.ndarray]) -> list[Result]:
classes = ["baked potato", "french fry", "raggmunk"]
return [
Result(metadata={"model": "Potato classifier 2000"}, data=[{"classification": random.choice(classes)}])
for _ in images
]