Quick and dirty migration from Hydrus Client

I have a ton of images that I have hoarded over the years. I have so many images that I have a really hard time finding images that I already know I saved. To solve this problem, around 2020 I started using Hydrus Client, a open-source piece of software for storing and tagging images.

Back then, I would painstakingly go through all pictures that landed on my PC and give them some descriptive tags. I did this every now and then for about 2-3 years and amassed approximately 2900 pictures. If you are a friend of mine, it is likely that you are tagged somewhere in this database with person: <your-name>.

But I realize that I never really used it for anything and it was a waste of time for me. I am not bashing Hydrus. It is a wonderful piece of software. It is just not for me anymore… Still, I want to archive these tags, such that I may sort them in another way in the future.

12 random pictures in my database. Heavily curated as I had to remove many personal pictures.

The problem is, Hydrus is a very alive project and gets updates weekly. My database is old and has not been migrated even once. There is a migration guide, but I could not for the life of me follow it - no version of Hydrus I could compile can open my database.

So I decided to create migration script which collects all the image files from the various Hydrus file buckets, and then creates a text file with all the tags. It also retags all the images using TMSU, a very light-weight tagging utility for Linux. I just wanted to share my migration script here, as I won’t be maintaining or tracking it.

When ran, the output looks a bit like this

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ python3 migrate.py
Loaded 16030 hashes
Found 2916 files on disk
Loaded 10655 tag mappings
Loaded 17 unique namespaces
Loaded 2481 unique subtags
Loaded 2436 unique tags
Copied 2916 files to migrated_files
Wrote overview file to migrated_files/hydrus_files.txt
Tagged files in TMSU database located at migrated_files

This takes about 20 seconds on my machine for my 2916 files. It then produces some new files:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ shuf -n 10 migrated_files/hydrus_files.txt
migrated_files/c2....png     self made|pixel art
migrated_files/46....png     screenshot|sharex
migrated_files/6d....png     screenshot|game=pokemon infinite fusion|sharex
migrated_files/fa....gif     collection=mediaproducts greatest pictures|cd-rom|collection=greatest pictures 3|face
migrated_files/dc....jpg     screenshot|game=lego star wars: the complete saga|game|steam screenshot
migrated_files/30....png     downloaded|person=andreas
migrated_files/95....jpg     uploader=danmarksmad|sharex|danmarksmad|pizza|leverpostej|agurk|instagram saved|source=instagram|screenshot
migrated_files/2c....png     sharex|dansk|bolle|sex|bog|book|screenshot
migrated_files/7b....png     desktop
migrated_files/31....png     øl|bajer memes|boomer meme|meme|beer|screenshot
1declare a=1 declare a=1 declare a=1 declare a=1 declare a=1 declare a=1 declare a=1 declare a=1 declare a=1 declare a=1 declare a=1 declare a=1 
2echo "$a"
3exit

How the script works

I started by trying to read the documentation, but it didn’t say muich about database schema. So I just started playing around with a database viewer, looking at the SQLite table definitions. Here is what I gathered:

So the general flow is this

  1. Gather all hashes in the hashes table. Attempt to find them in the file system
  2. For each hash, determine its associated tag_ids via current_mappings_8,9 tables
  3. For each tag_id, determine the namespace_id and subtag_id via the tags table
  4. For each tag_id, determine a tag name by concatenating the associated strings from the namespaces and subtags tables

You now have a list of hashes, the file paths and tags with names such as cheese and person=andreas.

  1. Move all these files somewhere new and create a file that lists all the new files and their tags.
  2. Loop over the files and add tag them with TMSU.

Full script

Don’t judge me on this code. I am only running once ever and never maintaining it.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
import sqlite3
from pathlib import Path
from collections import defaultdict
from subprocess import run
import argparse

parser = argparse.ArgumentParser(description="Migrate Hydrus files and tags.")
parser.add_argument("hydrus_db_path", type=Path, help="Path to Hydrus database directory")
parser.add_argument("--output-dir", type=Path, default=Path("./migrated_files/"), help="Directory to store migrated files")
args = parser.parse_args()

HYDRUS_DB_PATH = args.hydrus_db_path
CLIENT_MASTER_PATH = HYDRUS_DB_PATH / "client.master.db"
CLIENT_MAPPINGS_PATH = HYDRUS_DB_PATH / "client.mappings.db"
CLIENT_FILES_PATH = HYDRUS_DB_PATH / "client_files"
OUTPUT_DIR = args.output_dir

OVERVIEW_FILE = OUTPUT_DIR / "hydrus_files.txt"
OUTPUT_DIR.mkdir(exist_ok=True)

# Open the databases
client_master_conn = sqlite3.connect(CLIENT_MASTER_PATH)
client_mappings_conn = sqlite3.connect(CLIENT_MAPPINGS_PATH)
client_master_cursor = client_master_conn.cursor()
client_mappings_cursor = client_mappings_conn.cursor()

def hydrus_tag_to_string(namespace, subtag):
    if namespace:
        return f"{namespace}={subtag}"
    else:
        return f"{subtag}"

# Maps from hash_id to dictionary of file info
files = defaultdict(dict)

# The 'hashes' table has hash_id (integer) and hash (blob bytes)
client_master_cursor.execute("SELECT hash_id, hash FROM hashes")
for hash_id, hash_blob in client_master_cursor.fetchall():
    hash_hex = hash_blob.hex()
    files[hash_id]['hash'] = hash_hex
print(f"Loaded {len(files)} hashes")

# Try to find each file in the client_files directory (it may have any extension)
for hash_id, info in files.items():
    candidate_folder = CLIENT_FILES_PATH / f"f{info['hash'][:2]}" # Hydrus places files in subdirectories named after the first two hex digits of the hash
    # Use rglob to find any file with this hash prefix
    matched_files = list(candidate_folder.rglob(f"{info['hash']}.*"))
    if matched_files:
        info['path'] = str(matched_files[0])
        info['new_path'] = str(OUTPUT_DIR / matched_files[0].name)
files = {k: v for k, v in files.items() if 'path' in v}  # Keep only files that were found
print(f"Found {len(files)} files on disk")

# The 'current_mappings_8' and 'current_mappings_9' tables map hash_id to tag_id
tag_count = 0
for table_name in ['current_mappings_8', 'current_mappings_9']:
    client_mappings_cursor.execute(f"SELECT hash_id, tag_id FROM {table_name}")
    for hash_id, tag_id in client_mappings_cursor.fetchall():
        if not hash_id in files:
            continue
        if 'tag_ids' not in files[hash_id]:
            files[hash_id]['tag_ids'] = set()
        files[hash_id]['tag_ids'].add(tag_id)
        tag_count += 1
print(f"Loaded {tag_count} tag mappings")

# Tags consist of a namespace and a subtag
client_master_cursor.execute("SELECT namespace_id, namespace FROM namespaces")
namespaces = {namespace_id: namespace for namespace_id, namespace in client_master_cursor.fetchall()}
print(f"Loaded {len(namespaces)} unique namespaces")
client_master_cursor.execute("SELECT subtag_id, subtag FROM subtags")
subtags = {subtag_id: subtag for subtag_id, subtag in client_master_cursor.fetchall()}
print(f"Loaded {len(subtags)} unique subtags")

tags = defaultdict(dict)
client_master_cursor.execute("SELECT tag_id, namespace_id, subtag_id FROM tags")
for tag_id, namespace_id, subtag_id in client_master_cursor.fetchall():
    tags[tag_id]['namespace_id'] = namespace_id
    tags[tag_id]['subtag_id'] = subtag_id
    tags[tag_id]['namespace'] = namespaces[namespace_id]
    tags[tag_id]['subtag'] = subtags[subtag_id]
    tags[tag_id]['name'] = hydrus_tag_to_string(namespaces[namespace_id], subtags[subtag_id])
print(f"Loaded {len(tags)} unique tags")

for hash_id, info in files.items():
    info['tags'] = [tags[tag_id]['name'] for tag_id in info.get('tag_ids', [])]

# Copy the files to the output directory
for hash_id, info in files.items():
    src_path = Path(info['path'])
    dest_path = Path(info['new_path'])
    if not dest_path.exists():
        dest_path.write_bytes(src_path.read_bytes())
print(f"Copied {len(files)} files to {OUTPUT_DIR}")


with OVERVIEW_FILE.open('w') as f:
    for hash_id, info in files.items():
        f.write(f"{info['new_path']}\t{'|'.join(info['tags'])}\n")
print(f"Wrote overview file to {OVERVIEW_FILE}")

# Migrate to TMSU
run(["tmsu", "init", str(OUTPUT_DIR)], capture_output=True)
for hash_id, info in files.items():
    dest_path = Path(info['new_path'])
    if info['tags']:
        tag_string = ' '.join(f'"{tag}"' for tag in info['tags'])
        run(f'tmsu tag "{dest_path}" {tag_string}', shell=True, capture_output=True)
print(f"Tagged files in TMSU database located at {OUTPUT_DIR}")

Published 21. November 2025

Last modified 21. November 2025