Models¶
Base model¶
BaseModel
¶
Bases: ABC
Model base class
This is the abstract base class of HTRflow models. It handles batching of inputs, some shared initialization arguments and generic logging.
Concrete model implementations bases this class and defines their
prediction method in _predict()
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
str | None
|
Model device as a string, recognizable by torch. Defaults
to |
None
|
allow_tf32
|
bool
|
Allow running matrix multiplications with TensorFloat-32. This speeds up inference at the expense of inference quality. Read more here: https://huggingface.co/docs/diffusers/optimization/fp16#tensorfloat-32 |
True
|
Source code in src/htrflow/models/base_model.py
_predict
abstractmethod
¶
predict
¶
Perform inference on images
Takes an arbitrary number of inputs and runs batched inference.
The inputs can be streamed from an iterator and don't need to
be simultaneously read into memory. Prints a progress bar using
tqdm
. This is a template method which uses the model-specific
_predict(...)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images
|
Collection[NumpyImage]
|
Input images |
required |
batch_size
|
int
|
Inference batch size, defaults to 1 |
1
|
image_scaling_factor
|
float
|
If < 1, all input images will be down- scaled by this factor, which can be useful for speeding up inference on higher resolution images. All geometric data in the result (e.g., bounding boxes) are reported with respect to the original resolution. |
1.0
|
tqdm_kwargs
|
dict[str, Any] | None
|
Optional keyword arguments to control the progress bar. |
None
|
**kwargs
|
Optional keyword arguments that are forwarded to
the model specific prediction method |
{}
|
Source code in src/htrflow/models/base_model.py
Text recognition models¶
TrOCR
¶
Bases: BaseModel
HTRflow adapter of the tranformer-based OCR model TrOCR.
Uses huggingface's implementation of TrOCR. For further information, see https://huggingface.co/docs/transformers/model_doc/trocr.
Example usage with the TextRecognition
step:
- step: TextRecognition
settings:
model: TrOCR
model_settings:
model: Riksarkivet/trocr-base-handwritten-hist-swe-2
device: cpu
model_kwargs:
revision: 6ecbb5d643430385e1557001ae78682936f8747f
generation_settings:
batch_size: 8
num_beams: 1
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
str
|
Path or name of pretrained VisisonEncoderDeocderModel. |
required |
processor
|
str | None
|
Optional path or name of pretrained TrOCRProcessor. If not given, the model path or name is used. |
None
|
model_kwargs
|
dict[str, Any] | None
|
Model initialization kwargs which are forwarded to VisionEncoderDecoderModel.from_pretrained. |
None
|
processor_kwargs
|
dict[str, Any] | None
|
Processor initialization kwargs which are forwarded to TrOCRProcessor.from_pretrained. |
None
|
kwargs
|
Additional kwargs which are forwarded to BaseModel's init. |
{}
|
Source code in src/htrflow/models/huggingface/trocr.py
_predict
¶
TrOCR-specific prediction method.
This method is used by predict()
and should typically not be
called directly. However, predict()
forwards additional kwargs
to this method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images
|
list[ndarray]
|
Input images. |
required |
**generation_kwargs
|
Optional keyword arguments that are forwarded to the model's .generate() method. |
{}
|
Source code in src/htrflow/models/huggingface/trocr.py
WordLevelTrOCR
¶
Bases: TrOCR
A version of TrOCR which outputs words instead of lines.
This TrOCR wrapper uses the model's attention weights to estimate word boundaries. See notebook [TODO: link] for more details. It does not support beam search, but can otherwise be used as a drop- in replacement of TrOCR.
Example usage with the TextRecognition
step:
- step: TextRecognition
settings:
model: WordLevelTrOCR
model_settings:
model: Riksarkivet/trocr-base-handwritten-hist-swe-2
device: cpu
model_kwargs:
revision: 6ecbb5d643430385e1557001ae78682936f8747f
generation_settings:
batch_size: 8
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
str
|
Path or name of pretrained VisisonEncoderDeocderModel. |
required |
processor
|
str | None
|
Optional path or name of pretrained TrOCRProcessor. If not given, the model path or name is used. |
None
|
model_kwargs
|
dict[str, Any] | None
|
Model initialization kwargs which are forwarded to VisionEncoderDecoderModel.from_pretrained. |
None
|
processor_kwargs
|
dict[str, Any] | None
|
Processor initialization kwargs which are forwarded to TrOCRProcessor.from_pretrained. |
None
|
kwargs
|
Additional kwargs which are forwarded to BaseModel's init. |
{}
|
Source code in src/htrflow/models/huggingface/trocr.py
Satrn
¶
Bases: BaseModel
HTRflow adapter of Openmmlabs' Satrn model
Example usage with the TextRecognition
pipeline step:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
str
|
Path to a local .pth model weights file or to a huggingface repo which contains a .pth file, for example 'Riksarkivet/satrn_htr'. |
required |
config
|
str | None
|
Path to a local config.py file or to a huggingface repo which contains a config.py file, for example 'Riksarkivet/satrn_htr'. |
None
|
kwargs
|
Additional kwargs which are forwarded to BaseModel's
|
{}
|
Source code in src/htrflow/models/openmmlab/satrn.py
_predict
¶
Satrn-specific prediction method
This method is used by predict()
and should typically not be
called directly.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images
|
list[NumpyImage]
|
Input images |
required |
kwargs
|
Additional keyword arguments that are forwarded to
|
{}
|
Source code in src/htrflow/models/openmmlab/satrn.py
Segmentation models¶
RTMDet
¶
Bases: BaseModel
HTRFLOW adapter of Openmmlabs' RTMDet model
This model can be used for region and line segmentation. Riksarkivet provides two pre-trained RTMDet models:
- https://huggingface.co/Riksarkivet/rtmdet_lines
- https://huggingface.co/Riksarkivet/rtmdet_regions
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
str
|
Path to a local .pth model weights file or to a huggingface repo which contains a .pth file, for example 'Riksarkivet/rtmdet_lines'. |
required |
config
|
str | None
|
Path to a local config.py file or to a huggingface repo which contains a config.py file, for example 'Riksarkivet/rtmdet_lines'. |
None
|
revision
|
str | None
|
A specific model revision, as a commit hash of the model's huggingface repo. If None, the latest available revision is used. |
None
|
kwargs
|
Additional kwargs which are forwarded to BaseModel's init. |
{}
|
Source code in src/htrflow/models/openmmlab/rtmdet.py
_predict
¶
RTMDet-specific prediction method
This method is used by predict()
and should typically not be
called directly.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images
|
list[NumpyImage]
|
List of input images |
required |
nms_downscale
|
float
|
If < 1, all masks will be downscaled by this factor before applying NMS. This leads to faster NMS at the expense of accuracy. |
1.0
|
nms_threshold
|
float
|
Score threshold for segments to keep after NMS. |
0.4
|
nms_sigma
|
float
|
NMS parameter that affects the score calculation. |
2.0
|
**kwargs
|
Additional arguments that are passed to DetInferencer.call. |
{}
|
Source code in src/htrflow/models/openmmlab/rtmdet.py
YOLO
¶
Bases: BaseModel
HTRflow adapter of Ultralytics' YOLO model
Example usage with the Segmentation
step:
- step: Segmentation
settings:
model: YOLO
model_settings:
model: Riksarkivet/yolov9-regions-1
revision: 7c44178d85926b4a096c55c89bf224855a201fbf
device: cpu
generation_settings:
batch_size: 8
generation_settings
accepts the same arguments as YOLO.predict()
.
See the Ultralytics documentation
for a list of supported arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
str
|
Path to a YOLO model. The path can be a path to a
local .pt model file (for example, |
required |
revision
|
str | None
|
Optional revision of the Huggingface repository. |
None
|
Source code in src/htrflow/models/ultralytics/yolo.py
Other models¶
DiT
¶
Bases: BaseModel
HTRFLOW adapter of DiT for image classification.
Uses huggingface's implementation of DiT. For further information about the model, see https://huggingface.co/docs/transformers/model_doc/dit.
Initialize a DiT model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
str
|
Path or name of pretrained AutoModelForImageClassification. |
required |
processor
|
str | None
|
Optional path or name of a pretrained AutoImageProcessor.
If not given, the given |
None
|
model_kwargs
|
dict | None
|
Model initialization kwargs that are forwarded to AutoModelForImageClassification.from_pretrained(). |
None
|
processor_kwargs
|
dict | None
|
Processor initialization kwargs that are forwarded to AutoImageProcessor.from_pretrained(). |
None
|
kwargs
|
Additional kwargs that are forwarded to BaseModel's init. |
required |
Source code in src/htrflow/models/huggingface/dit.py
_predict
¶
Perform inference on images
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images
|
list[ndarray]
|
List of input images. |
required |
return_format
|
Literal['argmax', 'softmax']
|
Decides the format of the output. Options are: - "softmax": returns the confidence scores for each class label and image. Default. - "argmax": returns the most probable class label for each image. |
'softmax'
|