Setup local chat server: Deepseek and Ollama

In this page, we will do following. Please feel free to jump to the section of your choice.

System Info

The setup is being performed on Apple Metal 3 Pro Macbook machine with 36GB memory. At the time of commencing this installation, I still have about 320GB of storage left.

Installing Ollama

Ollama installation is a very straightforward process.

Verify Installation

Model Download

A list of model that is supported by Ollama can be found at this link.

Since Deepseek R1 is most capable of these, and is available for free to download, we will attempt to download R1. Considering the hardware limits of a personal computer, we will attempt to play around with 14B parameter model.s

The model can be pulled into the local machine to be used with Ollama by doing following:

% ollama pull deepseek-r1:14b pulling manifest pulling 6e9f90f02bb3... 100% ▕██ 9.0 GB pulling 369ca498f347... 100% ▕██▏ 387 B pulling 6e4c38e1172f... 100% ▕██▏ 1.1 KB pulling f4d24e9138dd... 100% ▕██▏ 148 B pulling 3c24b0c80794... 100% ▕██▏ 488 B verifying sha256 digest writing manifest success

The downloaded model is saved locally at ~/.ollama/models

If you have multiple models downloaded, you can list them using following command:

% ollama list NAME ID SIZE MODIFIED deepseek-r1:14b ea35dfe18182 9.0 GB 7 minutes ago

Asking Questions

Using Terminal Access: Chat Server

Of the many ways that we can use Ollama to talk to a LLM model, we will first see the most straightforward method. Here, we will use Ollama to load the model, and talk to it through terminal.

You can start this process by simply running following in your terminal.

prawarpoudel@Mac ~ % ollama run deepseek-r1:14b >>> Send a message (/? for help)

Upon which you can send message and talk to it just like any LLM service available.

Following is an example of me talking to the Deepseek R1 model that I just instantiated.

>>> Can you interpret a dream? Why do we dream when we sleep? Please limit your answer to within 150 words. ;think; Alright, I need to answer the user's query about whether I can interpret dreams and why people dream during sleep, keeping it under 150 words. First, from our previous conversation, I know that REM sleep is linked with vivid dreaming. The user seems interested in both the interpretation of dreams and their purpose. I should confirm that while I can provide interpretations based on common symbols and theories, there's no definitive method for interpreting dreams scientifically. Also, I should briefly explain why we dream, touching on REM sleep's role in processing emotions, memories, and possibly aiding problem-solving. I need to make it concise, so I'll focus on the key points: no scientific consensus on interpretation methods and the purposes of REM sleep. ;think; While there is no definitive method for scientifically interpreting all dreams, many people use symbolic meanings or personal associations to gain insights. Dreams often reflect emotions, experiences, or thoughts from waking life. Why we dream during sleep remains unclear, but REM (Rapid Eye Movement) sleep, when most vivid dreaming occurs, may help process emotions, consolidate memories, and aid problem-solving. However, the exact purpose of dreams is still debated.

To quit:

Use Ctrl + d or /bye to exit. >>> /bye

Using CURL

Curl requests can be useful when designing an agent for completing other tasks with the help of LLM models. To send a CURL request, we first need to condition our request that we will see below:

Finally, putting it all together.

% curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{"model":"deepseek-r1:14b","prompt":"Can you interpret a dream? Why do we dream when we sleep? Please limit your answer to within 150 words","stream":false}'

The response to which is following:

{"model":"deepseek-r1:14b","created_at":"2025-02-02T22:13:31.237735Z", "response":"\u003cthink\u003e\nOkay, xxxx \n\u003c/think\u003e\n\nDreams occur during xxx significance.","done":true,"done_reason":"stop","context":[...], "total_duration":27204042750,"load_duration":571147333,"prompt_eval_count":28, "prompt_eval_duration":465000000,"eval_count":368,"eval_duration":26166000000}%

The response field in the above response is the answer we are looking for. There are think delimiters as part of response as well, so duing AI agents desigh, these can be use appropriately.

The reason for supplying stream: false in the CURL request is to not allow streaming output. Otherwise, the response would be streaming, and each token in the output is sent as a separate response with an additional field done.

Using UI Similar to OpenAI ChatGPT UI

OpenWebUI project provides a UI that is similar to all state of the art chat LLM projects. Installing openWebUI allows us to have a local set up that is similar to the commercial projects.

To install open web UI, I am following the instructions at the official link.

Following are the exact steps that I took:

Chat with UI

Now we can start to chat with the set up that we just made. Following shows a snippet of the chat that I did in my local setup, where I had asked can you write a python code to find the sum of all the numbers in a csv file.

Using Python Library

We can use python library ollama to talk to the local installation of the ollama and the model that we downloaded. Following is an example.

import ollama prompt = f"Can you interpret a dream? Why do we dream when we sleep? Please limit your answer to within 150 words." model = "deepseek-r1:14b" response = ollama.generate(model=model,prompt=prompt) print(response['response'])

Hardware Usage:

During the answering of question, following is the display in the GPU History window in the Activity Monitor of Macbook.