top.webp

UMK

1About

Transforming Public Access to Information with Semantic Search

Profil Software partnered with the City of Krakow to enhance its public information bulletin (BIP), which contains over 1.6 million documents in various formats (HTML, Word, PDF). The existing search engine allowed users to input keywords and search documents, but it didn’t allow questions or semantic search. The goal of the project was to implement a semantic search engine based on Retrieval-Augmented Generation (RAG). This solution would allow users to ask questions and receive AI-generated responses, along with the relevant documents used to formulate the answer, providing a more intuitive and efficient way to access public information.

2
Country
Poland
Partnership
4 months

2Challenge

The project presented two significant challenges. The vast amount of documents (over 1.6 million) necessitated the creation of a vector database for efficient search capabilities. Managing such a large dataset required robust infrastructure and scalable solutions. Additionally, the team committed to using open-source models, running them on custom-configured hardware (GPUs), rather than relying on ready-made solutions from providers like OpenAI. The second problem was that many AI models are optimized for languages like English, French, or Spanish, but less so for Polish. This meant the team needed to identify or adapt models that could effectively handle the nuances of the Polish language. A benchmarking process was employed to ensure that the models selected would perform accurately in Polish.

challange.webp

3Solution

To address these challenges, Profil Software implemented a multi-faceted approach. The team developed a framework for continuous benchmarking, where various components of the RAG application (such as the LLM or data embedding models) were tested for accuracy, context relevance, and document alignment. This iterative process ensured constant optimization of the system to meet project goals. The solution was built using a microservice architecture with three core services: Web, NLP, and ETL (Extract, Transform, Load), all hosted on Kubernetes. This approach enabled deployment both on cloud infrastructure and the City of Krakow's on-premise servers. The vector database was built on top of PostgreSQL using the pgvector extension hosted in Kubernetes cluster. This decision significantly reduced operational costs while ensuring flexibility.With this comprehensive approach, we delivered a highly efficient, scalable, and cost-effective AI-powered search solution tailored to the unique needs of the City of Krakow’s public information system.

solution.webp
Icon

Let's talk!

Profil Software will help you pick the right team for your software development. Book an appointment and tell us what you need!

Other projects

Tarot Routing

Route Planning and Optimization Software

Tarot Routing uses state-of-the-art algorithms to plan more efficient last-mile driving routes faster than humans can. Customers reduce their driving time by 30%, and of course reduce their CO₂ emissions, petrol consumption, driver salaries and maintenance costs.

case study
Tarot Routing Carousel