A technical question - how does your system work in the IRIS or other examples y...

deliahu · on Nov 22, 2019

Hi, I'm one of the maintainers of Cortex

>> how does your system work in the IRIS or other examples you have? It looks like it hides under the hood a few things compared to this implementation - type of the model (tensorflow) and the logic that fetches it from S3, right?

That's exactly right. Cortex has three runtimes:

The Predictor runtime, which is used in this post, can run arbitrary Python. There is an optional key in `cortex.yaml` for Predictors called `model`, which is an S3 path to an exported model (or directory). If provided, Cortex will download the file/directory at that path and make it available as an argument in the `init(model_path, metadata)` function in your Predictor implementation (see here for the Predictor docs: https://www.cortex.dev/deployments/predictor)

The TensorFlow and ONNX runtimes behave a little differently (and similar to each other): `model` is a required field in the API config, and Cortex handles downloading the model and running inference against it. You may define a `request_handler`, which can contain pre- and post-request handling (here are the TensorFlow docs: https://www.cortex.dev/deployments/tensorflow, and here are the ONNX docs: https://www.cortex.dev/deployments/onnx)

>> Let's imagine we would want to make a DVC repo (just to store model versions to start) out of one of your examples instead of the DVC get started, how would we do that with the current implementation (through metadata + custom init)?

Yes, that is also exactly right - you'll have to use the Predictor runtime since that allows you to define how to download your model. You would specify metadata and leave out the `model` config field, similar to as done in this post. In `init(model_path, metadata)`, you would use the metadata to download and load the model

shcheklein · on Nov 22, 2019

Thank you! It all makes sense.