PharoInfer is a production-ready inference engine for Pharo Smalltalk that brings Large Language Models (LLMs) directly into the Pharo environment. It supports local inference, multiple model formats, streaming text generation, embeddings, and an OpenAI-compatible chat API, with backends such as native Pharo, Ollama, and llama.cpp.
Metacello new
githubUser: 'pharo-llm' project: 'pharo-infer' commitish: 'X.X.X' path: 'src';
baseline: 'AIPharoInfer';
load.
Metacello new
githubUser: 'pharo-llm' project: 'PharoInfer' commitish: 'main' path: 'src';
baseline: 'AIPharoInfer';
load.
model := AIModel fromFile: '/path/to/model.gguf' asFileReference.
model backend: AILocalBackend new.
AIModelManager default registerModel: model.
engine := AIInferenceEngine default.
engine complete: 'Tell me a story about' model: 'your-model-name'.
| manager engine model |
manager := AIModelManager default.
manager currentBackend: AILlamaCppBackend new.
model := manager loadModel: (FileLocator home / 'path' / 'model.gguf') fullName.
engine := AIInferenceEngine default.
engine backend: manager currentBackend.
engine complete: 'Hello World !' model: model name.
request := AIChatCompletionRequest
model: 'your-model-name'
messages: {
AIChatMessage system: 'You are a helpful AI assistant'.
AIChatMessage user: 'What is Smalltalk?' }.
AIChatAPI default complete: request.