Skip to content

Quickstart

Open In Colab - Quickstart

Data

Load dataset from huggingface

from datasets import load_dataset

dataset = load_dataset("Riksarkivet/Trolldomkomission")["train"]

images = dataset["image"]

Volume

from htrflow_core.volume import Volume

vol = Volume([images])

Segment Images

from htrflow_core.models.ultralytics.yolo import YOLO

seg_model = YOLO('ultralyticsplus/yolov8s')
res = seg_model(vol.images()) # vol.segments() is also possible since it points to the images

Update Volume

vol.update(res)

HTR

from htrflow_core.models.huggingface.trocr import TrOCR

rec_model = TrOCR()
res = rec_model(vol.segments())

vol.update(res)

Note

The final volume

    print(vol)

Serialize

Saves at outputs/.xml, since the two demo images are called the same, we get only one output file

vol.save('outputs', 'alto')

..

Whenever you have large documents, you typicall ...

# Something here