Skip to content

Models¶

Supported Models¶

HTRflow natively supports specific models from the following frameworks: Ultralytics, Hugging Face, and OpenMMLab.

Tip

For a complete list of predefined models compatible with our Pipeline steps, see the Model reference.

Riksarkivet Models¶

Riksarkivet provides several ready-to-use models available on Hugging Face.

Here are some of the different materials the models were trained on:

OpenMMLab Models¶

To use OpenMMLab models (e.g SATRN), specific dependencies need to be installed, including torch, mmcv, mmdet, mmengine, mmocr, and yapf. Follow the instructions below to ensure the correct versions are installed.

Note

OpenMMLab requires a specific PyTorch version. Make sure you have pytorch==2.0.0 installed:

pip install -U torch==2.0.0

You can install the OpenMMLab dependencies using either mim or pip.

The recommended method, according to OpenMMLab, is to use mim, which is a package and model manager.

First, install mim:

pip install -U openmim

Then, use mim to install the required packages:

mim install -U mmdet
mim install -U mmengine
mim install -U mmocr
mim install -U mmcv

Alternatively, you can install the dependencies using pip:

pip install -U mmcv==2.0.0
pip install -U mmdet==3.1.0
pip install -U mmengine==0.7.2
pip install -U mmocr==1.0.1
pip install -U yapf==0.40.1

Here are links to the documentation for each OpenMMLab package used in HTRflow:

Custom Models¶

If your model (or framework) is not supported, you can implement a custom model in HTRflow. Below is a basic example:

class Model(BaseModel):
    def __init__(self, *args, **kwargs):
        # Initialize your model here
        pass

    def _predict(self, images, **kwargs) -> list[Result]:
        # Run inference on `images`
        # Should return, for example, Result.text_recognition_result() 
        # or Result.segmentation_result()

See Result reference on different types of return formats from the models. For instance, Result.text_recognition_result() for HTR or Result.segmentation_result() for segmenetation or object detection.

Examples of Custom Implementations¶

Text Recognition Model:

class RecognitionModel(BaseModel):
    def _predict(self, images: list[np.ndarray]) -> list[Result]:
        metadata = {"model": "Lorem dummy model"}
        n = 2
        return [
            Result.text_recognition_result(
                metadata,
                texts=[lorem.sentence() for _ in range(n)],
                scores=[random.random() for _ in range(n)],
            )
            for _ in images
        ]

Document Classification Model:

class ClassificationModel(BaseModel):
    """Model that classifies input images into different types of potato dishes."""

    def _predict(self, images: list[np.ndarray]) -> list[Result]:
        classes = ["baked potato", "french fry", "raggmunk"]
        return [
            Result(metadata={"model": "Potato classifier 2000"}, data=[{"classification": random.choice(classes)}])
            for _ in images
        ]