Starting an inference server for a model
Once a model is downloaded, a model service can be started. A model service is an inference server that is running in a container and exposing the model through the well-known chat API common to many providers.
Prerequisites
Procedure
- Click the Podman AI Lab icon in the navigation bar
- In the Podman AI Lab navigation bar, click Models > Services menu item.
- Click the New Model Service button on the top right.
- Select the model you want to start an inference server for in the Model list and click the Create Service button.
- The inference server for the model is being started and after a while, click on the Open service details button.
Verification
- Once the inference server is started, the details for the inference server allows you to generate code snippets in various languages to access the model through the inference server.
- You can change the target language, here for Java and Quarkus.