In the most recent ACE Ventures Workshop, Malte Pietsch, co-founder and CTO of deepset, shared his insights and experiences on deploying large language models (LLMs). Deepset is at the forefront of natural language processing (NLP) and on its way to becoming a standard for building LLM applications in the enterprise segment!
One key takeaway was the importance of retrieval augmentation in language model pipelines. Pietsch emphasized using this technique, which involves incorporating additional information into the prompts given to the language model. Doing so effectively reduces the generation of incorrect information, also known as hallucinations. For instance, when training a language model to answer questions about a specific company, retrieval augmentation allows the inclusion of relevant internal data from that company. This added context helps the language model understand the company’s internal workings, leading to more accurate and contextually relevant responses.
The cost and performance considerations of deploying language models were highlighted during the discussion. The deployment cost can vary significantly, ranging from approximately $1,000 for self-hosted models to as much as $60,000 for models hosted by third-party providers. To mitigate these costs, several strategies were suggested. Firstly, opting for smaller, specialized models and optimizing the retrieval and prompting processes can help reduce expenses. This includes utilizing cache or persistent storage to leverage similarities with previous queries and minimize the number of calls. Another aspect discussed was the utilization of small device sets, where the expected answer and relevant documents are provided. Incorporating this information, such as key terms or indicative phrases, into the retrieval process allows for more targeted and efficient retrieval of relevant information, enhancing the system’s overall performance.