Build your Private ChatGPT with DevZero
LLMs revolutionize natural language processing, but also raise privacy concerns. This blog explores Locally Deployed LLMs to leverage LLMs capabilities while maintaining control over your data with a focus on the open-source project PrivateGPT. We'll guide you through setting up PrivateGPT using DevZero's Cloud Development Environments, ensuring LLM usage without compromising privacy. By the end, you'll have your own PrivateGPT ready for use.
If you would like to dive right into creating your own privateGPT, proceed to Creating your own privateGPT with DevZero section.
What are LLMs? #
Large Language Models (LLMs) are a type of artificial intelligence (AI) algorithm that employ deep learning techniques and substantial data sets to comprehend, summarize, generate, and predict new content. These sophisticated language models comprise artificial neural networks with numerous parameters varying from tens of millions to billions, and are trained on large quantities of unlabeled text via self-supervised learning or semi-supervised learning. LLMs first emerged around 2018 and have since profoundly impacted the field of natural language processing (NLP).
To grasp the intricacies and structures of language, LLMs are trained on extensive textual data, such as entire books, articles, and web pages. Subsequently, they utilize this acquired knowledge to execute tasks such as language translation, question-answering, text summarization, and even creative writing. Some well-known LLMs are OpenAI's GPT, Google's PaLM 2, and Facebook's LLaMA. They can do many cool things, such as:
- Create chatbots that can have conversations.
- Write text for product descriptions, blog posts, and articles.
- Answer questions in a friendly way.
- Translate text between languages.
- Turn speech into text, understand feelings in text, summarize text, check spelling, and sort words.
Protecting Privacy in LLMs #
LLMs are designed to learn from massive amounts of data, which often encompass sensitive or private information. This inherent characteristic of LLMs inevitably raises concerns about privacy and data protection. Moreover, when LLMs are deployed using cloud-based services, the potential for data exposure and security breaches increases. As a secure alternative, locally deployed LLMs can help mitigate these risks while maintaining data privacy. There are already some preliminary solutions publicly available for deploying LLMs locally, such as privateGPT, GPT4All and h2oGPT. These methods can be employed to safeguard sensitive data during the fine-tuning process and response generation, ensuring a more secure and privacy-conscious approach to utilizing LLMs. We will be exploring privateGPT for demonstration in this blog.
However, running Large Language Models (LLMs) on your local PC is limited by the available computing power and resources. As a result, privateGPT may operate quite slowly on your own machines. Additionally, setting up the technical stack locally is your responsibility. To simplify this process, DevZero offers cloud development environments where you will have access to a remote machine equipped with the required resources and configuration for running privateGPT. Keep in mind that only the machine is connected to the cloud. The data you work with remains under your control, as you are not transferring it to any cloud-based service during this process.
Here are two variations of the PrivateGPT setup for a user: one without DevZero and the other with DevZero.
Creating your own privateGPT with DevZero #
DevZero makes it simple to create your own privateGPT.
1. If you haven’t yet, Create a DevZero account with our simple Get Started flow.
2. Check here for more information on how to create and set up your DevZero account and customize it further.
3. Once signed in to DevZero Console, Create a copy of your own privateGPT template using this link
4. Click Launch and it will take you to your console where it starts creating your workspace.
5. After a few minutes, your development environment with privateGPT is ready to go.
6. Select “Open in Web Browser”
7. Your environment is now set up and you are ready to start your privateGPT adventure.
8. Open Terminal from the UI or with the shortcut (
Ctrl + ` OR
9. Make sure you are already in the privateGPT folder and if not, navigate to the privateGPT project folder
10. If you don't see
.env file already, Copy contents of example.env to a new .env file or rename the existing file with
mv example.env .env and update the contents of it as per your choice.
Here are the values you could use:
PERSIST_DIRECTORY: is the folder you want your vectorstore in MODEL_TYPE: supports LlamaCpp or GPT4All MODEL_PATH: Path to your GPT4All or LlamaCpp supported LLM EMBEDDINGS_MODEL_NAME: SentenceTransformers embeddings model name (see https://www.sbert.net/docs/pretrained_models.html) MODEL_N_CTX: Maximum token limit for the LLM model MODEL_N_BATCH: Number of tokens in the prompt that are fed into the model at a time. Optimal value differs a lot depending on the model (8 works well for GPT4All, and 1024 is better for LlamaCpp) TARGET_SOURCE_CHUNKS: The amount of chunks (sources) that will be used to answer a question
11. Now, before querying privateGPT with your questions, we need to ingest the dataset. Here, we already downloaded and ingested the test data provided. So, you can proceed to next step if you wish to play with this dataset.
But If you would like to ingest your own dataset, put any and all your files into the
source_documents directory by following the instructions here
As an example, I am adding Starcoder readme file as privateGPT supports .md files with the command:
wget https://github.com/bigcode-project/starcoder/blob/main/README.md -O source_documents/starcoder_README.md
Replace the url in the above command with wherever your data source is.
When you ingest the data for the first time, it might take as long as 20-30 seconds per document.
12. Once the ingestion is complete, you should receive a message that you are now ready run
In order to ask a question, you need to run the command:
After the model is ready, it should ask your input to enter a query.
13. So, as we have the default source document, I asked the sample question from the README docs. Here is the result that I got:
You can keep asking more questions one after the other. Once you are done, Type exit to finish. If you would like to add more data sources, proceed to the data ingestion section and add your documents to the source_documents folder and then run
python ingest.py before you could query from new data. If you would rather want to start from an empty database, delete the db folder.
If you are looking to explore other open source language models for text generation, code generation, transcription, image generation and more, you should definitely check out our other blog Open Source LLMs here
In conclusion, locally deployed LLMs, such as privateGPT, allow users to harness the power of advanced language models while maintaining data privacy and control. By leveraging DevZero, setting up and using privateGPT becomes a seamless and efficient process, providing a more accessible way to work with LLMs without compromising privacy.
If you would like any other templates in DevZero or want to have a custom demo with your specific use case, feel free to schedule a demo at your convenience here.
Developer Advocate, DevZero