Wrong conversion of default args leads to errors in Pytorch Lightning

Hi!
I wanted to test guild.ai in combination with pytorch lightning.
However I am facing the problem that default arguments are not omitted but used with an empty string ''.
Small Example:

main.py

from argparse import ArgumentParser


def main(args):
    model = LightningModule()
    trainer = Trainer.from_argparse_args(args)
    trainer.fit(model)


if __name__ == "__main__":
    parser = ArgumentParser()
    parser = Trainer.add_argparse_args(parser)
    args = parser.parse_args()

    main(args)

Running guild run main.py deterministic=true doesn’t call python main.py --deterministic true but python main.py --deterministic 1 --auto_select_gpus '' --benchmark '' ....
In this case Pytorch Lightning throws main.py: error: argument --auto_select_gpus: invalid str_to_bool value: ''
How can I either suppress default flags from being sent or modify the parsing?
The str_to_bool function is defined here, the add_argparse_args here and the Trainer here.

Thanks!

Can you provide your guild.yaml file?

I don’t have one.
Instead of running: python main.py --deterministic true I ran guild run main.py deterministic=true

You can reproduce this with:

import os
from argparse import ArgumentParser

import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader, random_split
from torchvision.datasets import MNIST
from torchvision import transforms
import pytorch_lightning as pl
from pytorch_lightning.metrics.functional import accuracy
from pl_bolts.datasets import DummyDataset

train = DummyDataset((1, 28, 28), (1,))
train = DataLoader(train, batch_size=32)
val = DummyDataset((1, 28, 28), (1,))
val = DataLoader(val, batch_size=32)
test = DummyDataset((1, 28, 28), (1,))
test = DataLoader(test, batch_size=32)

class LitAutoEncoder(pl.LightningModule):

    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 3))
        self.decoder = nn.Sequential(nn.Linear(3, 128), nn.ReLU(), nn.Linear(128, 28 * 28))

    def training_step(self, batch, batch_idx):
        # --------------------------
        # REPLACE WITH YOUR OWN
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = F.mse_loss(x_hat, x)
        self.log('train_loss', loss)
        return loss
        # --------------------------

    def validation_step(self, batch, batch_idx):
        # --------------------------
        # REPLACE WITH YOUR OWN
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = F.mse_loss(x_hat, x)
        self.log('val_loss', loss)
        # --------------------------

    def test_step(self, batch, batch_idx):
        # --------------------------
        # REPLACE WITH YOUR OWN
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = F.mse_loss(x_hat, x)
        self.log('test_loss', loss)
        # --------------------------

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        return optimizer


# init model
ae = LitAutoEncoder()

# Initialize a trainer
parser = ArgumentParser()
parser = pl.Trainer.add_argparse_args(parser)
args = parser.parse_args()
trainer = pl.Trainer.from_argparse_args(args)

# Train the model ⚡
trainer.fit(ae, train, val)
pip install pytorch-lightning
pip install pytorch-lightning-bolts
pip install guildai

Now it works with:
python main.py --deterministic=true --max_steps=4
But not with:
guild run main.py deterministic=true max_steps=4

main.py: error: argument --auto_select_gpus: invalid str_to_bool value: ‘’

I looked into this and you need to use the arg-switch feature of guild.

Create a guild.yaml file like this:

- model: auto_encoder
  source-code: "*.py"
  operations:
    train:
      main: main
      flags:
        deterministic:
          arg-switch: yes

You can now run

guild run auto_encoder:train deterministic=yes
guild run auto_encoder:train deterministic=no

And it should be parsed correctly by pytorch-lightning.

Note you don’t have to specify all hyperparameters in your guild file since you can use --force-flags:

guild run auto_encoder:train deterministic=yes gradient_clip_val=0.1 --force-flags

I am not sure if you can do arg-switch from the CLI without specifying a guild.yaml file. Maybe @garrett knows.

1 Like

Wow! Thanks a lot. It works perfectly.

1 Like