AI Models that Runs Locally


weekly digest deeplearning

The rise of ChatGPT and other large language models is pretty fascinating in the past few months.

New models are being published with larger model sizes and data, claiming to make an improvement from before.

Apart from some serious debate on the potential danger of the technology, it surely opens new ways for us to do the things we used to do.

Since some models are too expensive to run just for fun, I look and tried several models that could be run locally on my computer so I could build something interesting.

Alpaca

The first one is the Large Language Model, similar to ChatGPT. This model is based on the Facebook LLM called LLaMA that has been trained using 65 Billion parameters on 1.4 trillion data with a combination of Standford Alpaca to make it more instruction-following model.

With models this size, surely it can’t be used cheaply. Fortunately, some people make quantization of the inference process to reduce the model size, thus making it possible to be run on the local computer, even without GPU.

The model I tried is from Alpaca.cpp which is the c/c++ port of the model. The Installation is quite staight forward like on the documentation:

  1. First, you clone the repo, and build the executable:
git clone https://github.com/antimatter15/alpaca.cpp
cd alpaca.cpp
make chat
  1. Download the binary from here and put it in the same folder as the chat program (i.e using wget:
wget -c https://huggingface.co/Sosaka/Alpaca-native-4bit-ggml/blob/main/ggml-alpaca-7b-q4.bin
  1. Run the chat program, and try ask it something:
./chat

IMG

Example of output from Alpaca.

Whisper

Whisper is a speech recognition model from OpenAI that trains on 680,000 hours of audio. It claimed to have produced a model that approaches human-level robustness and accuracy in English speech. OpenAI open-sourcing this model on the github, but I haven’t tested it yet.

What I tried is the Whisper.cpp which again, is the C++ port of the models and is quantized so it could run cheaply on the local computer.

The installation is also easy:

  1. Clone the repository
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
  1. Download the base model (or any supported models). Fortunately, the project provides a script to download the necessary model:
bash ./models/download-ggml-model.sh base.en
  1. Build the program: make
  2. And the test run it using the sample audio:
./main -f samples/jfk.wav

I’ve tried it with several audios, and so far the result is fascinating. The next thing I want to try is to run it with input directly from the Microphone, so we could send commands to the computer via voice.

TTS: Text To Speech

Lastly, what I tried this week is text to speech model. First, I came to espeak-ng tools but the result isn’t that great. After further looking, I came across Mozilla TTS that I found quite fascinating. The usage and installation are easy, and the result (even the default voice) is extraordinary.

To install it, we could use pip:

pip install TTS

And then test it by running:

tts --text "Hello World"

For the first time, it will download the default voice model. We could list the available voice by using the --list_models parameter. To change voice, use --model_name "<type>/<language>/<dataset>/<model_name>", it will try to download the model for the first time.

I’m currently looking for image generative model that could be run locally, like Stable Diffusion that could complete my tools. Using this, I think we could try to create a smart voice assitant of our own.