LlamaEdgeChatService
provides developers an OpenAI API compatible service to chat with LLMs via HTTP requests.
LlamaEdgeChatLocal
enables developers to chat with LLMs locally (coming soon).
LlamaEdgeChatService
and LlamaEdgeChatLocal
run on the infrastructure driven by WasmEdge Runtime, which provides a lightweight and portable WebAssembly container environment for LLM inference tasks.
LlamaEdgeChatService
works on the llama-api-server
. Following the steps in llama-api-server quick-start, you can host your own API service so that you can chat with any models you like on any device you have anywhere as long as the internet is available.