With the rapid advancements in artificial intelligence (AI), running sophisticated models like Meta's Llama 3.1 locally on personal computers is becoming increasingly popular. Running an LLM on your local PC or Mac provides a sandbox for experimentation and development without compromising data privacy and allows for more flexibility in model usage.
Also: Why the future must be BYO AI: Model lock-in deters users and stifles innovation
Here is a quick guide to help you set up and run Llama 3.1 -- as well as many other models such as Google Gemma2 -- on Mac, Linux, and Windows. I'll also discuss the benefits of privately hosted models.
Llama 3.1 8b running on Ollama/Open WebUI
Jason PerlowDeveloping and testing against various open source models you privately host and run offers several advantages over relying solely on publicly hosted large language models (LLMs) from providers like OpenAI, Microsoft CoPilot, Meta AI, and Google Gemini.
Data privacy: Publicly hosted LLMs require sending data over the internet, which can raise privacy and security concerns. Running models locally ensures that sensitive data remains on your own hardware.
Customization: Open-source models allow for greater customization. Developers can fine-tune models, adjust hyperparameters, and modify the architecture to suit specific use cases better.
Cost control: Cloud-based AI services can be costly, especially for large-scale applications. Hosting models locally can significantly reduce ongoing API usage and data transfer expenses.
Offline capability: Local models can be used without an internet connection, which is essential for applications requiring high availability or in areas with unreliable internet access.
Flexibility and experimentation: Hosting your own models enables you to experiment with different algorithms and configurations, leading to innovative solutions and a deeper understanding of AI technologies.
Freedom from usage policies: Running LLMs locally means the usage policies of companies like OpenAI, Microsoft, Meta, and Google do not restrict you. You can use whatever prompts you want and employ modified LLMs with lifted restrictions, trained on data that these services might restrict.
Also: The best AI chatbots: ChatGPT, Copilot, and worthy alternatives
Ollama is a versatile and MIT-licensed open-source platform designed to help developers and researchers easily run and manage machine learning models locally on their own hardware. It was developed by a team of AI enthusiasts and engineers who aim to provide tools that ensure data privacy, flexibility, and control over AI applications. Ollama supports various AI models, making it a valuable resource for those looking to explore and utilize AI technologies without relying on third-party cloud services.
Here are some example models that can be downloaded:
Model | Parameters | Size | Download |
---|---|---|---|
Llama 3.1 | 8B | 4.7GB | ollama run llama3.1 |
Llama 3.1 | 70B | 40GB | ollama run llama3.1:70b |
Llama 3.1 | 405B | 231GB | ollama run llama3.1:405b |
Phi 3 Mini | 3.8B | 2.3GB | ollama run phi3 |
Phi 3 Medium | 14B | 7.9GB | ollama run phi3:medium |
Gemma 2 | 2B | 1.6GB | ollama run gemma2:2b |
Gemma 2 | 9B | 5.5GB | ollama run gemma2 |
Gemma 2 | 27B | 16GB | ollama run gemma2:27b |
Mistral | 7B | 4.1GB | ollama run mistral |
Moondream 2 | 1.4B | 829MB | ollama run moondream |
Neural Chat | 7B | 4.1GB | ollama run neural-chat |
Starling | 7B | 4.1GB | ollama run starling-lm |
Code Llama | 7B | 3.8GB | ollama run codellama |
Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
LLaVA | 7B | 4.5GB | ollama run llava |
Solar | 10.7B | 6.1GB | ollama run solar |
Per Ollama's GitHub page, you should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
I tested Ollama using M1 Pro and M1 Ultra Macs with 32GB and 64GB of RAM, which are a few generations behindcurrent MacBook Pro models . Despite this, using CPU-only assistance, we successfully ran 8B-10B parameter models of Meta's Llama 3.1 and Google's Gemma2, as well as various specifically trained variants from Ollama's website, with better-than-acceptable performance.
Also: I broke Meta's Llama 3.1 405B with one question (which GPT-4o gets right)
However, I experienced significant performance issues with the 70B parameter variant using these systems. I'm confident that more recent hardware can handle these models even more efficiently, especially with Linux PCs enabled by Nvidia and AMD GPUs.
The Ollama command line interface with chat functionality.
Screenshot by Jason PerlowAlso: 3 ways Meta's Llama 3.1 is an advance for Gen AI
Open a terminal (MacOS, Linux) or Command Prompt/PowerShell (Windows) and run the following command to install Open WebUI:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Open WebUI running on Docker Desktop
Screenshot by Jason PerlowSelecting a model in Open WebUI
Screenshot by Jason PerlowOllama can be integrated into various Integrated Development Environments (IDEs) using APIs, which enhances the development workflow by providing seamless interaction with AI models. One powerful tool for this integration is Continue, an open-source code assistant that leverages the Ollama API.
Also: If you want a career in AI, start with these 5 steps
For more demanding applications, especially those requiring larger models like the 70B and 405B parameter Llama 3.1 models,running Ollama on a Linux-based system equipped with powerful GPUs is recommended. This setup can handle the computational load and provide faster response times, making it suitable for enterprise-level AI applications.
To use GPUs for running Ollama, follow these steps:
These commands ensure that Ollama can utilize the available GPUs on your system, providing the necessary computational power for running large models. For more detailed instructions, refer to the Ollama GPU documentation.
You can still leverage GPU support if you prefer running Ollama in a container.Please note that these instructions apply only to Linux for now.
Also: How can business leaders ready their organizations for AI? 4 keys to success
These commands ensure the Docker container can access all available GPUs on your system, providing the necessary computational power to run large models. For more information on using GPUs with Docker and Ollama, refer to the Docker page on using GPUs with Ollama.
Also: Will OpenAI's new AI detection tool put an end to student cheating?
Running AI models such as Meta's Llama 3.1 locally on your Mac or PC provides numerous benefits, including improved data privacy, greater customization, and cost savings. Following the steps in this guide, you can utilize advanced AI models and test different configurations to meet your requirements. Whether you are a developer, researcher, or AI enthusiast, having the ability to run complex models locally unlocks many opportunities.