Guild runs command is slow

Hi there,

in contrast to the other topics I’m talking about the CLI command guild runs.

The problem I’m frequently encountering is that running anything related to guild runs, e.g. guild view, or guild compare is… often slow, sometimes fast again.

For example, running guild runs stop <hash> takes multiple minutes. Having only ~2000 experiments in total.

Running strace reveals a wall of

openat(AT_FDCWD, "/path/to/.guild/runs/2f98245057a97fa68f714334a5927c32/.guild/attrs/started", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=17, ...}, AT_EMPTY_PATH) = 0
ioctl(3, TCGETS, 0x7ffc47fdc640)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "1704721789638391\n", 8192)     = 17
read(3, "", 8192)                       = 0
read(3, "", 8192)                       = 0
close(3)

Stracing guild runs reveals thousands of similar openat syscalls.
Going through the source code I assume this is due to calling _all_runs in var.py.
This function goes through all experiments and checks for their status by searching for files in .guild/attrs (in run.py) for each experiment.

If my understanding is correct, then this this leads to O(runs * files) runtime and is heavily influenced by I/O time. So, always using files instead of the existing .guild/cache database makes this quite slow.

Is this correct? Are there any plans to improve the performance?

Cheers
Alessandro