Backend

Model generation

Run the first section of tell-me-a-story_gpt2.ipynb to generate the model. Download the complete model directory to models/. Note that rather than attempting to download the model directly from Colab, it is much faster to save the model to your Google Drive and download from there (steps in the notebook). You can try it directly with test_inference.py or by running api.py and navigating to localhost:8008.

Deployment to Google Cloud Run

Note first of all that Google Cloud Run is not an ideal platform for a large deep network model. The model samples many times more quickly on a GPU (Cloud Run is CPU only), and there is a significant startup cost to load the model (Cloud Run is serverless, so workers are dynamically created and removed).

That said, for limited use, a serverless CPU deployment is by far the cheapest option for a toy project like this one, as it is pay for use only. No known free services will accept Docker images of the size required to fit in the model (~500GB).

Begin by following the "Before you begin" section of the Cloud Run quickstart guide
Build using gcloud builds submit --tag gcr.io/PROJECT-ID/helloworld, where PROJECT-ID is the result of gcloud config get-value project
Deploy using gcloud run deploy --image gcr.io/PROJECT-ID/helloworld --platform managed

Et voila! Make sure to use your new API url on the frontend (see those docs).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Backend

Model generation

Deployment to Google Cloud Run

Files

README.md

Latest commit

History

README.md

File metadata and controls

Backend

Model generation

Deployment to Google Cloud Run