Mykolas Katkus

Co-founder and CEO of Repsense talks about deploying Havel and using LLMs to capture semantic meaning of texts at scale

What is Havel?

Havel is a Foreign Information Manipulation and Interference (FIMI) dashboard that tracks traditional media, social media, and short video content across platforms. It identifies stories, detects disinformation and propaganda, and helps users understand and plan strategic responses to narrative battlegrounds.

Currently deployed in three countries, Havel enables Strategic Communications (Stratcom) and FIMI teams to instantly assess the media landscape and develop effective response strategies. The platform is also used by law enforcement institutions, regulators, and financial institutions to identify fraud and prevent cyberattacks, as well as by political parties and other large stakeholders.

Where do you want to see Repsense in 5-8 years from now?

We have a clear goal: to become the reference product for multinational public and private stakeholders seeking to track, assess, and simulate the impact of public information on people's behavior and decisions.

To achieve this, we are developing models that will help us identify relationships between the supply side—media landscape evaluation—and the demand side—opinion polls and real-life decisions. We're also working on agent-based simulations to better understand these dynamics.

What tech stack and LLMs allow you to research at this scale?

We rely primarily on text embedding technology (the core of LLMs), which allows us to encapsulate the semantic meaning of texts into measurable vectors. This enables us to manipulate them, calculate distances, cluster communities, and even compute angles, projections, and mathematical products of the content we consume daily. This creates a "black box moment" for Repsense users, where suddenly a massive volume of media begins to make sense again.

This utilization of text embeddings requires us to extract maximum information from given content. Therefore, we continuously deploy technologies such as speech-to-text, optical character recognition, voice recognition, and image analysis, enhanced with an additional layer of LLMs (GPT, Claude, and especially the newest Google Gemini versions for qualitative object recognition and description) across our documents. This orchestration of these three latter components makes the Repsense toolkit a pioneer in deeper media understanding, especially where information is lacking (for example, half of TikTok videos have no title or description).

Finally, our data analysis philosophy is based on high-fidelity contributor data. Here are a couple of examples:

Context Weighting: There's a significant difference between your entity being mentioned in the title of a highly negative article versus being randomly mentioned on the last page of a lengthy newsletter. It's crucial for us to measure this difference numerically, so we actively use sentiment analysis, prominence scoring, text quality assessment, named-entity recognition, and other models.

Data Enrichment: All collected datapoints (author, domain, interactions) are initially just strings without information about source origin or features. By connecting secondary external databases (such as Ahrefs), we link Repsense data with other analysis networks to provide richer insights.

Interview