sqlite3.OperationalError: disk I/O error when using the scratch drive on Linux cluster for storage of guild runs

Hi,

I am trying to use Guild for hyperparameter optimization. I am running max-trials of 50 and want to store these temporary models in the /scratch drive on the Linux cluster. I checked that the drive is mounted corrected and I am able to read and write properly in the drive. However, when i submit my guild run, I get the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/guild/plugins/skopt_gp_main.py", line 77, in <module>
    main()
  File "/usr/local/lib/python3.6/dist-packages/guild/plugins/skopt_gp_main.py", line 30, in main
    skopt_util.handle_seq_trials(batch_run, _suggest_x)
  File "/usr/local/lib/python3.6/dist-packages/guild/plugins/skopt_util.py", line 210, in handle_seq_trials
    _run_seq_trials(batch_run, suggest_x_cb)
  File "/usr/local/lib/python3.6/dist-packages/guild/plugins/skopt_util.py", line 234, in _run_seq_trials
    batch_flag_vals,
  File "/usr/local/lib/python3.6/dist-packages/guild/plugins/skopt_util.py", line 266, in _iter_seq_trials
    prev_trials = prev_trials_cb()
  File "/usr/local/lib/python3.6/dist-packages/guild/plugins/skopt_util.py", line 224, in <lambda>
    prev_trials_cb = lambda: batch_util.trial_results(batch_run, [objective_scalar])
  File "/usr/local/lib/python3.6/dist-packages/guild/batch_util.py", line 404, in trial_results
    return trial_results_for_runs(trial_runs(batch_run), scalars)
  File "/usr/local/lib/python3.6/dist-packages/guild/batch_util.py", line 408, in trial_results_for_runs
    index = _run_index_for_scalars(runs)
  File "/usr/local/lib/python3.6/dist-packages/guild/batch_util.py", line 423, in _run_index_for_scalars
    index = indexlib.RunIndex()
  File "/usr/local/lib/python3.6/dist-packages/guild/index.py", line 314, in __init__
    self._db = self._init_db()
  File "/usr/local/lib/python3.6/dist-packages/guild/index.py", line 323, in _init_db
    self._init_tables(db)
  File "/usr/local/lib/python3.6/dist-packages/guild/index.py", line 349, in _init_tables
    """
sqlite3.OperationalError: disk I/O error

I checked my /scratch drive and found that the runs and cache folders are created. Also found that one folder was created inside runs. But this folder was empty.

The same command works perfectly when I run using my GUILD_HOME as a different drive. I am not sure if I am missing anything here.

I would appreciate any help from your side.

Thanks,
Vishal

What was the nature of the drive? Is it a network drive (i.e. nfs)?
Am I correct that you specified GUILD_HOME=/scratch and got this error?

It seems that you are running from a system python, if you didn’t specify GUILD_HOME, it’s likely your default GUILD_HOME is placed under directories where you don’t have write privilege (Could be somewhere other than /scratch).