CrowdPrisma

Overview

CrowdPrisma was an AI-native survey analysis platform I co-founded and built in 2024. Survey tools had long solved the collection problem - the analysis problem remained broken. Hundreds of pages of free-text responses, often spanning multiple languages, were still being processed with Excel and manual tagging. CrowdPrisma automated this entirely.

The platform’s core insight: treat text responses as a first-class data type, not an afterthought. By combining state-of-the-art LLMs with purpose-built topic modelling pipelines, CrowdPrisma could ingest a survey with thousands of open-ended responses and return structured, actionable insights in minutes.

Key features

Prisma Dashboard: an auto-generated, interactive dashboard that detects each question type (text, categorical, multiple-choice, numeric, date) and visualises it appropriately. Built for comparative analysis: any subgroup comparison is a few clicks away, backed by statistical tests.

AI TextEngine: the core of the platform. A multi-stage LLM pipeline that (1) extracts coherent recurring topics from free-text columns, (2) deduplicates and names them into human-readable themes, and (3) assigns each response to the relevant topics, extracting supporting quotes. This converted qualitative text into quantitative variables that could be analysed alongside all other survey dimensions. The engine operated in any language and auto-detected sentiment, translating discovered topics into English so mixed-language datasets were handled uniformly.

GroupFinder & Autopilot: compressed all survey dimensions into a 2D scatter plot (via UMAP) where proximity reflected overall similarity across all answers. Users could colour participants by any variable, or draw freehand selections to define ad-hoc subgroups and instantly compare their responses. The Autopilot mode automatically discovered statistically distinct subgroups across both structured and text data.

Technical evaluation

The accompanying technical evaluation benchmarks the TextEngine against classical and modern topic modelling baselines - including LDA, NMF, and BERTopic - across topic discovery quality, assignment accuracy (precision, recall, F1), and interpretability. The evaluation used a customer-support dataset with ground-truth labels and assessed both single-label and multi-label assignment scenarios. CrowdPrisma’s pipeline outperformed all baselines on assignment accuracy while producing topics that were significantly more interpretable - a result of using LLMs for theme creation rather than relying on raw word co-occurrence statistics.

The key architectural choices that drove this: topic extraction runs multiple times over sub-samples of the corpus to surface robust recurring themes before a consolidation step merges near-duplicates; assignment is done per-response in isolated batches (not in bulk) to avoid context bleeding; and multi-label assignment with extracted quotes makes every decision auditable on the dashboard.

What we built it for - and what happened

The original target was policy research: public consultations, regulatory feedback rounds, and large-scale qualitative studies. We demonstrated this concretely by processing 10 EU public consultations - over 1,300 pages spanning 20+ languages - in under two hours, producing interactive dashboards for each. The same work would have taken a team of consultants weeks using Google Translate and manual Excel tagging.

The EU Commission’s own Director for Strategy, Better Regulation & Corporate Governance acknowledged the broken state of this process in writing. Despite that validation, the regulatory and procurement environment made selling into EU institutions extremely slow. We couldn’t find a repeatable go-to-market path fast enough, and the company wound down in 2024.

The platform worked. The market timing and distribution were the hard problems.

Demo Videos

Twitter Facebook LinkedIn