The internals
import random
random.seed(123)
Models / inferencers¶
Models & inferencers accept lists of images, and return lists of results (either segmentation or recognition results)
I have made a dummy SegmentationModel
and RecognitionModel
in models.py
. These do the same thing as the current inferencers.
class SegmentationModel:
def __call__(self, images: list[np.ndarray]) -> list[SegmentationResult]:
...
@dataclass
class SegmentationResult:
boxes: np.ndarray
masks: np.ndarray
scores: np.ndarray
labels: np.ndarray
(It would be nice to wrap all models in a "batching" function, which divides an input list into chunks if it is too long) -> This is a card in DevOps
Using the Volume class¶
To load images, create a Volume
. The name of this class is not set in stone... It represents what Catrin called a "batch", a divison of an archive volume, but I don't want to use "batch" because of potential confusion with a model's batch (the number of inputs it operates on simultaneously).
from htrflow_core.volume import Volume
images = ["../assets/demo_image.jpg"] * 5
volume = Volume(images)
The Volume
instance holds a tree. We see the root node
and its five children, each representing one input image:
print(volume)
└──<htrflow_core.volume.Node object at 0x7f5aa834c1c0> ├──626x1629 image demo_image ├──626x1629 image demo_image ├──626x1629 image demo_image ├──626x1629 image demo_image └──626x1629 image demo_image
The images are available through volume.images()
. We pass them through a segmentation model:
from htrflow_core.models.dummy_models import SegmentationModel
model = SegmentationModel()
results = model(volume.images())
print(results[0])
SegmentationResult(metadata={'model_name': 'SegmentationModel'}, image=array([[[118, 120, 128], [115, 117, 125], [114, 116, 124], ..., [215, 219, 220], [209, 213, 214], [206, 210, 211]], [[110, 112, 120], [110, 112, 120], [110, 112, 120], ..., [211, 215, 216], [207, 211, 212], [209, 213, 214]], [[109, 112, 120], [109, 112, 120], [104, 107, 115], ..., [207, 211, 212], [205, 209, 210], [209, 213, 214]], ..., [[146, 152, 151], [147, 153, 152], [147, 153, 152], ..., [212, 218, 213], [214, 222, 211], [211, 221, 204]], [[144, 150, 149], [146, 152, 151], [148, 154, 153], ..., [217, 223, 212], [220, 231, 205], [216, 234, 187]], [[147, 153, 152], [149, 155, 154], [151, 157, 156], ..., [214, 221, 208], [214, 228, 194], [208, 231, 169]]], dtype=uint8), segments=[Segment(bbox=(345, 751, 11, 167), mask=array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], dtype=uint8), polygon=array([[345, 85], [356, 115], [393, 140], [527, 167], [672, 151], [726, 127], [751, 93], [740, 63], [703, 38], [570, 11], [417, 29], [365, 55]], dtype=int32), score=0.7689563885870707, class_label='region')])
The results are a list of SegmentationResult
. To apply the results to the input images, we pass them back to the volume with its update
method. It returns the new regions as a list of images.
regions = volume.update(results)
The volume tree has now grown:
print(volume)
└──<htrflow_core.volume.Node object at 0x7f5aa834c1c0> ├──626x1629 image demo_image │ └──156x406 region at (345, 11) ├──626x1629 image demo_image │ ├──117x406 region at (17, 0) │ ├──156x406 region at (948, 262) │ └──156x309 region at (0, 85) ├──626x1629 image demo_image │ ├──156x406 region at (480, 173) │ ├──156x406 region at (690, 11) │ ├──149x406 region at (570, 0) │ ├──156x332 region at (1296, 381) │ └──156x292 region at (0, 16) ├──626x1629 image demo_image │ ├──99x213 region at (1415, 0) │ └──116x406 region at (678, 509) └──626x1629 image demo_image ├──156x278 region at (0, 234) ├──156x406 region at (786, 133) ├──156x406 region at (1105, 461) └──90x406 region at (442, 0)
The new regions can be passed through a segmentation model (such as a line model) again. The update
method always updates the leaves of the tree.
results = model(volume.segments())
volume.update(results)
print(volume)
└──<htrflow_core.volume.Node object at 0x7f5aa834c1c0> ├──626x1629 image demo_image │ └──156x406 region at (345, 11) │ ├──37x100 region at (517, 129) │ ├──22x100 region at (636, 144) │ ├──38x100 region at (543, 125) │ ├──38x100 region at (486, 122) │ └──38x69 region at (681, 38) ├──626x1629 image demo_image │ ├──117x406 region at (17, 0) │ │ └──28x100 region at (216, 70) │ ├──156x406 region at (948, 262) │ │ ├──33x100 region at (1070, 384) │ │ ├──38x87 region at (948, 359) │ │ └──38x57 region at (1296, 329) │ └──156x309 region at (0, 85) │ ├──38x76 region at (7, 159) │ ├──38x76 region at (142, 124) │ ├──34x76 region at (218, 85) │ ├──38x76 region at (215, 125) │ └──38x76 region at (52, 105) ├──626x1629 image demo_image │ ├──156x406 region at (480, 173) │ │ ├──38x100 region at (623, 272) │ │ ├──38x100 region at (498, 270) │ │ ├──38x100 region at (561, 244) │ │ └──38x100 region at (652, 261) │ ├──156x406 region at (690, 11) │ │ ├──38x82 region at (690, 122) │ │ ├──38x95 region at (690, 13) │ │ ├──37x54 region at (690, 129) │ │ ├──38x100 region at (919, 95) │ │ └──38x100 region at (805, 59) │ ├──149x406 region at (570, 0) │ │ └──23x71 region at (904, 125) │ ├──156x332 region at (1296, 381) │ │ ├──38x53 region at (1296, 403) │ │ ├──35x82 region at (1469, 381) │ │ └──38x82 region at (1328, 457) │ └──156x292 region at (0, 16) │ └──38x65 region at (0, 129) ├──626x1629 image demo_image │ ├──99x213 region at (1415, 0) │ │ ├──24x52 region at (1426, 71) │ │ ├──24x52 region at (1463, 37) │ │ └──24x52 region at (1525, 31) │ └──116x406 region at (678, 509) │ ├──28x100 region at (929, 544) │ └──28x76 region at (1007, 512) └──626x1629 image demo_image ├──156x278 region at (0, 234) │ └──38x68 region at (144, 330) ├──156x406 region at (786, 133) │ ├──38x100 region at (891, 223) │ ├──38x64 region at (786, 154) │ ├──38x100 region at (1000, 245) │ └──38x100 region at (911, 242) ├──156x406 region at (1105, 461) │ ├──29x100 region at (1170, 587) │ ├──38x100 region at (1194, 571) │ └──38x100 region at (1219, 509) └──90x406 region at (442, 0) ├──22x91 region at (442, 14) ├──13x67 region at (780, 0) ├──22x100 region at (681, 18) ├──21x100 region at (554, 0) └──22x100 region at (667, 6)
When the segmentation is done, the segments can be passed to a text recognition model. The results are passed to the workbench in the same manner as before:
from htrflow_core.models.dummy_models import RecognitionModel
recognition_model = RecognitionModel()
results = recognition_model(volume.segments())
volume.update(results)
print(volume)
└──<htrflow_core.volume.Node object at 0x7f5aa834c1c0> ├──626x1629 image demo_image │ └──156x406 region at (345, 11) │ ├──37x100 region at (517, 129) "Dolor velit non non tempora magnam ut adipisci." │ ├──22x100 region at (636, 144) "Dolor quiquia quisquam adipisci velit velit quiquia quiquia." │ ├──38x100 region at (543, 125) "Ipsum labore dolorem ut neque ipsum velit." │ ├──38x100 region at (486, 122) "Consectetur est numquam voluptatem quiquia ipsum." │ └──38x69 region at (681, 38) "Magnam etincidunt consectetur neque quaerat ut sit ipsum." ├──626x1629 image demo_image │ ├──117x406 region at (17, 0) │ │ └──28x100 region at (216, 70) "Modi sed non tempora." │ ├──156x406 region at (948, 262) │ │ ├──33x100 region at (1070, 384) "Numquam quiquia ut etincidunt sit quaerat adipisci." │ │ ├──38x87 region at (948, 359) "Est etincidunt dolore modi." │ │ └──38x57 region at (1296, 329) "Dolore ut tempora numquam voluptatem dolorem etincidunt non." │ └──156x309 region at (0, 85) │ ├──38x76 region at (7, 159) "Numquam amet quisquam magnam modi." │ ├──38x76 region at (142, 124) "Dolorem dolorem eius aliquam eius." │ ├──34x76 region at (218, 85) "Eius tempora modi sit." │ ├──38x76 region at (215, 125) "Tempora labore velit dolor." │ └──38x76 region at (52, 105) "Consectetur neque labore porro quiquia." ├──626x1629 image demo_image │ ├──156x406 region at (480, 173) │ │ ├──38x100 region at (623, 272) "Quaerat sed ipsum tempora." │ │ ├──38x100 region at (498, 270) "Ipsum aliquam consectetur dolor." │ │ ├──38x100 region at (561, 244) "Sed magnam aliquam aliquam dolor." │ │ └──38x100 region at (652, 261) "Sed dolor amet sed adipisci etincidunt." │ ├──156x406 region at (690, 11) │ │ ├──38x82 region at (690, 122) "Voluptatem aliquam aliquam porro amet." │ │ ├──38x95 region at (690, 13) "Modi aliquam quiquia etincidunt labore." │ │ ├──37x54 region at (690, 129) "Tempora dolore quiquia ipsum neque consectetur tempora." │ │ ├──38x100 region at (919, 95) "Tempora labore modi ut non." │ │ └──38x100 region at (805, 59) "Ut dolorem labore dolore consectetur." │ ├──149x406 region at (570, 0) │ │ └──23x71 region at (904, 125) "Est labore dolor est." │ ├──156x332 region at (1296, 381) │ │ ├──38x53 region at (1296, 403) "Neque eius adipisci amet voluptatem consectetur." │ │ ├──35x82 region at (1469, 381) "Voluptatem magnam voluptatem labore sed dolore voluptatem." │ │ └──38x82 region at (1328, 457) "Dolore ut magnam voluptatem etincidunt amet adipisci." │ └──156x292 region at (0, 16) │ └──38x65 region at (0, 129) "Etincidunt etincidunt quiquia porro velit." ├──626x1629 image demo_image │ ├──99x213 region at (1415, 0) │ │ ├──24x52 region at (1426, 71) "Etincidunt etincidunt dolorem modi dolorem." │ │ ├──24x52 region at (1463, 37) "Neque quaerat dolorem magnam." │ │ └──24x52 region at (1525, 31) "Sed aliquam dolor quisquam numquam." │ └──116x406 region at (678, 509) │ ├──28x100 region at (929, 544) "Velit tempora non quiquia magnam ipsum sed." │ └──28x76 region at (1007, 512) "Dolor sed velit quisquam dolor." └──626x1629 image demo_image ├──156x278 region at (0, 234) │ └──38x68 region at (144, 330) "Amet adipisci quaerat quiquia sit dolor numquam ut." ├──156x406 region at (786, 133) │ ├──38x100 region at (891, 223) "Etincidunt velit ut neque labore quisquam." │ ├──38x64 region at (786, 154) "Aliquam labore aliquam quaerat consectetur." │ ├──38x100 region at (1000, 245) "Ut non numquam ut." │ └──38x100 region at (911, 242) "Ipsum sed non dolore eius consectetur." ├──156x406 region at (1105, 461) │ ├──29x100 region at (1170, 587) "Sed sed magnam tempora velit." │ ├──38x100 region at (1194, 571) "Numquam quisquam dolore ut non." │ └──38x100 region at (1219, 509) "Sit amet ipsum neque neque adipisci consectetur." └──90x406 region at (442, 0) ├──22x91 region at (442, 14) "Ipsum ut eius sit porro sit." ├──13x67 region at (780, 0) "Dolorem voluptatem sed voluptatem non modi quisquam." ├──22x100 region at (681, 18) "Sed amet labore dolorem velit aliquam." ├──21x100 region at (554, 0) "Sit non amet velit dolorem dolore labore." └──22x100 region at (667, 6) "Dolorem amet amet modi voluptatem."
Accessing nodes¶
Specific nodes are accessed by tuple indexing. Here we extract the first line of the first region of the first image:
# Access image 0, region 0, subregion 0
volume[0, 0, 0]
# Access image 0, region 0
volume[0, 0]
<htrflow_core.volume.RegionNode at 0x7f5a496ef1f0>
The image associated with each node is accessed through the image
attribute. The image isn't stored directly in the node, instead, the node refers to the parent image, and crops it according to its box:
class BaseImageNode:
@property
def image(self):
x1, x2, y1, y2 = self.box
return self.parent.image[y1:y2, x1:x2]
...
volume[0, 0, 0].image
array([[[255, 255, 255], [255, 255, 255], [255, 255, 255], ..., [255, 255, 255], [255, 255, 255], [255, 255, 255]], [[255, 255, 255], [255, 255, 255], [255, 255, 255], ..., [255, 255, 255], [255, 255, 255], [255, 255, 255]], [[255, 255, 255], [255, 255, 255], [255, 255, 255], ..., [255, 255, 255], [255, 255, 255], [255, 255, 255]], ..., [[255, 255, 255], [255, 255, 255], [255, 255, 255], ..., [255, 255, 255], [255, 255, 255], [255, 255, 255]], [[255, 255, 255], [255, 255, 255], [255, 255, 255], ..., [255, 255, 255], [255, 255, 255], [255, 255, 255]], [[255, 255, 255], [255, 255, 255], [255, 255, 255], ..., [255, 255, 255], [255, 255, 255], [255, 255, 255]]], dtype=uint8)
Coordinates¶
All nodes have a coordinate
attribute. This is the location of the node's top-left corner relative to the original image. The base image node's coordinate is thus (0,0):
print(volume[0].coordinate)
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[11], line 1 ----> 1 print(volume[0].coordinate) AttributeError: 'PageNode' object has no attribute 'coordinate'
For first-level regions coordinate
is the same as the corner of the segment bounding box.
print("Coordinate:", volume[0, 0].coordinate)
print("Bounding box:", volume[0, 0].data["segment"].box, "(x1, x2, y1, y2)")
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[12], line 1 ----> 1 print('Coordinate:', volume[0, 0].coordinate) 2 print('Bounding box:', volume[0, 0].data['segment'].box, '(x1, x2, y1, y2)') AttributeError: 'RegionNode' object has no attribute 'coordinate'
But for nested regions the two differ, because coordinate
is relative to the original image, while the segment bounding box is relative to the parent region.
print("Global coordinate:", volume[0, 0, 0].coordinate)
print("Local bounding box:", volume[0, 0, 0].data["segment"].box)
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[13], line 1 ----> 1 print('Global coordinate:', volume[0, 0, 0].coordinate) 2 print('Local bounding box:', volume[0, 0, 0].data['segment'].box) AttributeError: 'RegionNode' object has no attribute 'coordinate'