Hi there,
in contrast to the other topics I’m talking about the CLI command guild runs
.
The problem I’m frequently encountering is that running anything related to guild runs
, e.g. guild view
, or guild compare
is… often slow, sometimes fast again.
For example, running guild runs stop <hash>
takes multiple minutes. Having only ~2000 experiments in total.
Running strace reveals a wall of
openat(AT_FDCWD, "/path/to/.guild/runs/2f98245057a97fa68f714334a5927c32/.guild/attrs/started", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=17, ...}, AT_EMPTY_PATH) = 0
ioctl(3, TCGETS, 0x7ffc47fdc640) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
read(3, "1704721789638391\n", 8192) = 17
read(3, "", 8192) = 0
read(3, "", 8192) = 0
close(3)
Stracing guild runs
reveals thousands of similar openat
syscalls.
Going through the source code I assume this is due to calling _all_runs
in var.py
.
This function goes through all experiments and checks for their status by searching for files in .guild/attrs
(in run.py
) for each experiment.
If my understanding is correct, then this this leads to O(runs * files) runtime and is heavily influenced by I/O time. So, always using files instead of the existing .guild/cache
database makes this quite slow.
Is this correct? Are there any plans to improve the performance?
Cheers
Alessandro