Managing Persons in Photo Collections – The Basics

Encoding & Recognition

Managing people in photo collections has evolved rapidly with digital photography and widely available pro-grade tools. Most of us now have huge libraries — which makes it surprisingly hard to find all photos of a specific person. This is especially important when you organize by people (movie stars, athletes, supermodels, or simply relatives and friends). To manage people effectively, we need intelligent search that can recognize who’s in a photo.

Modern photo managers like Apple Photos and Google Photos auto-detect faces, making it easier to group and search by person. Even so, challenges remain: accurately identifying people in older or low-quality images, handling duplicates, and keeping metadata consistent. The good news: recent progress in face recognition and person re-identification (ReID) — plus better metadata practices — means much of “people management” can be automated with the right approach.

This short post (the first in a series) shows how to leverage professional-grade methods with approachable tools — so you can keep your library organized and searchable, and take your people management to the next level. We’ll start with the basics: encoding and recognition.

A layered identity model (what we’re building next)

Eventually, we’ll use a layered model:

  • Objective layer, observations from a recognition engine: embeddings that represent a person’s face (and optionally body).
  • Subjective layer, human-readable assertions: a database of people with names and biographical details.
  • Linking layer,  mappings that connect the two so your software can “speak both languages.”

This post focuses on the foundation — creating encodings and recognizing matches.

Building your own Recognition Engine

We’ll start with Python-based face recognition, and later extend to full-body ReID. A practical starting point for body ReID is FastReID, a PyTorch-based toolbox with a “model zoo” of pre-trained models. (ReID is about saying “have I seen this person before in my collection?” using body features; face recognition focuses on the face.).

Key idea: a recognizer needs a knowledge base — previously encoded faces (and bodies) of known people. Here’s a simple, effective dataset layout:

/persons_dataset

    /Person1

        – image1.jpg

        – image2.jpg

    /Person2

        – image1.jpg

        – image2.jpg

Aim for 15–20 varied images per person (angles, lighting, expressions). Each folder name is the person label you want the system to learn.

Step 1 — Encode faces

Below a small script that imports some libraries for data handling and the ‘simplest face recognition library’ using dlib‘s state-of-the-art face recognition built with deep learning. The script scans your dataset, extracts 128-D face embeddings, and saves them alongside the corresponding names.

# encode_faces.py
import os, pickle, pathlib
import numpy as np
import face_recognition

DATASET_DIR = pathlib.Path("persons_dataset")
OUT_FILE = "encodings.pkl"

encodings = []
names = []

for person_name in sorted(os.listdir(DATASET_DIR)):
    person_dir = DATASET_DIR / person_name
    if not person_dir.is_dir():
        continue
    for img_name in os.listdir(person_dir):
        img_path = person_dir / img_name
        try:
            image = face_recognition.load_image_file(img_path)
            # Optional: use 'cnn' model if you have a GPU and dlib compiled with CUDA
            boxes = face_recognition.face_locations(image, model="hog")
            face_vecs = face_recognition.face_encodings(image, boxes)
            if len(face_vecs) == 1:
                encodings.append(face_vecs[0])
                names.append(person_name)
        except Exception as e:
            print(f"Skip {img_path}: {e}")

with open(OUT_FILE, "wb") as f:
    pickle.dump({"encodings": np.asarray(encodings), "names": np.asarray(names)}, f)

print(f"Saved {len(encodings)} encodings for {len(set(names))} people to {OUT_FILE}")
Python

That’s it, you just created your first photo knowledge base (KB)! Let’s use it.

Step 2 — Recognize faces in an image

We’ll load the encodings in our KB, open a test image, encode the face in this photo and then find the closest known face by Euclidean distance. You control strictness via TOLERANCE (typical range 0.5–0.6; lower = stricter).

# recognize_image.py
import pickle, numpy as np, face_recognition

with open("encodings.pkl", "rb") as f:
    db = pickle.load(f)
KNOWN = db["encodings"]
NAMES = db["names"]

TOLERANCE = 0.55  # adjust to taste

def recognize_persons(image_path):
    image = face_recognition.load_image_file(image_path)
    boxes = face_recognition.face_locations(image, model="hog")
    faces = face_recognition.face_encodings(image, boxes)

    results = []
    for face in faces:
        dists = face_recognition.face_distance(KNOWN, face)
        idx = int(np.argmin(dists))
        if dists[idx] <= TOLERANCE:
            results.append(NAMES[idx])
        else:
            results.append("Unknown")
    return results

if __name__ == "__main__":
    test_image = "test_image.jpg"
    print(", ".join(recognize_persons(test_image)) or "No faces found.")
Python

Processing only 1 image is not very efficient, so let’s make the script more useful.

Step 3 — Batch process a folder

Below an adapted script to process a whole directory, copying images into subfolders by recognized name (or Unknown).

# batch_process.py
import os, shutil, pathlib, time, pickle, numpy as np
import face_recognition

INPUT_DIR  = pathlib.Path("E:/persons_unknown")
OUTPUT_DIR = pathlib.Path("E:/persons_processed")
ENC_FILE   = pathlib.Path("E:/persons_dataset/encodings.pkl")
VALID_EXT  = {".jpg", ".jpeg", ".png", ".webp"}
TOLERANCE  = 0.55
MOVE_FILES = False  # set True to move instead of copy

with open(ENC_FILE, "rb") as f:
    db = pickle.load(f)
KNOWN = db["encodings"]
NAMES = db["names"]

def save_to_bucket(img_path, label):
    dest_dir = OUTPUT_DIR / label
    dest_dir.mkdir(parents=True, exist_ok=True)
    dest = dest_dir / img_path.name
    if MOVE_FILES:
        shutil.move(str(img_path), dest)
    else:
        shutil.copy2(str(img_path), dest)

def process_image(img_path):
    try:
        image = face_recognition.load_image_file(img_path)
        boxes  = face_recognition.face_locations(image, model="hog")
        faces  = face_recognition.face_encodings(image, boxes)
        if not faces:
            save_to_bucket(img_path, "Unknown")
            return
        # If multiple faces, save one copy per recognized label (dedup with a set)
        labels = set()
        for face in faces:
            dists = face_recognition.face_distance(KNOWN, face)
            idx = int(np.argmin(dists))
            label = NAMES[idx] if dists[idx] <= TOLERANCE else "Unknown"
            labels.add(label)
        for label in labels:
            save_to_bucket(img_path, label)
    except Exception as e:
        print(f"Error {img_path}: {e}")

def main():
    start = time.time()
    images = [p for p in INPUT_DIR.rglob("*") if p.suffix.lower() in VALID_EXT]
    total = len(images)
    for i, p in enumerate(images, 1):
        process_image(p)
        print(f"Progress: {i/total:0.1%} ({i}/{total})")
    print(f"Done in {time.time()-start:0.1f}s")

if __name__ == "__main__":
    main()
Python

In 3 small steps we created the means to encode & recognize people’s faces in your photo collections. Later, we will show you how to add a small PyQt6 GUI to pick input/output folders, start the run, show progress and inspect results. For now, these small examples suffice to demonstrate the core mechanics clear and minimal.

Where body ReID fits (next posts)

Face recognition works best when faces are visible and reasonably sharp. Person ReID complements this by using full-body appearance (bodily parts, clothing, silhouette) to cluster or match the same person across photos where the face isn’t clear. Tools like TorchReID provide strong pre-trained models and a store of body embeddings for your known identities. We’ll add this in the next post as an optional layer on top of the face pipeline in the next post.

Related Stories