In our LLM prompting experiment series, we test how well large language models can build real product features inside Ruoom’s open source CRM code.
But building a feature is more than simply writing code, and each experiment makes design choices that shape how people use the product.
After every experiment, we’ve been asking our UI/UX designer and front-end developer, Jasmina, to evaluate what the model created. Her reviews focus on usability , consistency, and trust.
For consistency each review will follow the same structure so we can compare results over time and she’ll give feedback on these criteria:
- Functionality: Does the design help someone complete the task it is meant to do?
- Hierarchy: Does it guide attention to the right place?
- Visual and system design: Does it align with Ruoom’s style and components?
- Copy and interaction language: Are the words clear and action oriented?
- Pattern analysis: Would this design work in other parts of the product?
- Expert insight: What would we teach a junior designer from this example?
- Meta observation: What does this reveal about how the LLM approaches design?
Why We Think This Step is Critical
When we (or anyone) uses an LLM to build product features, the generated code also produces visual and interactive elements. These design choices often reflect how the model “thinks” about usability and consistency, but they aren’t always inspired by good design.
This review helps us study those choices from a human perspective. By asking Jasmina to evaluate each asset, we can see where AI aligns with good design logic and where it misses the subtleties that make an interface feel intuitive and trustworthy.
Keeping humans in the loop is central to how we work at Ruoom. AI can accelerate production, but judgment, taste, and empathy still come from people. These reviews help us preserve that balance of using AI to build faster while keeping human oversight at the core of quality, trust, and experience.
Over time, this process will guide how we write prompts, set design standards, and define collaboration between human and AI contributors inside real product development (and hopefully encourage readers learning from this to do the same!)
What We Reviewed
The Disconnect Google Calendar button + Populated Calendar created during LLM Experiment #1, which focused on integrating Google Calendar into Ruoom’s open source CRM:

Functional Evaluation
How well does this design help someone do what it’s meant to do?
It works well for basic calendar integration. It shows Google Calendar events clearly within the Ruoom system, eliminating the need to switch between apps. However, it lacks clarity on event interactivity. Users can’t tell which events are imported from Google, and which are editable.
Hierarchy & Information Design
Does it guide your eyes to the right place and make what’s important clear?
The addition of the “Google Calendar Connected” status badge clearly communicates that synchronization was successful. This is important for user trust, even though the design itself could be improved. The hierarchy is fine if the focus is on “Do I have anything scheduled?” rather than on event details.
Copy & Interaction Language
Does it look consistent with the rest of Ruoom’s design and feel intentional?
Visually, it aligns well with the rest of Ruoom’s interface — using the same colors and typography. “Disconnect Google” might be a bit too visually prominent, but its placement is appropriate. Still, I think it would be helpful to add a subtle visual distinction between native Ruoom events and those coming from Google Calendar.
Copy & Interaction Language
Are the words clear about what happens when you click or tap?
I believe the event names are clear to the calendar owner. 🙂
As for the “Disconnect Google” button, it’s clear, but it’s important that the action requires confirmation – which this doesn’t.
Comparative & Pattern-Based Analysis
Would this design still work well if we used it in other parts of the app?
This type of calendar could easily be used in other contexts as well (e.g., task overview, internal team schedules), especially if the Google Calendar integration remains consistent.
Expert Insight & Corrective Design
If a junior designer made this, what would you teach them from it?
Key improvements for calendar integrations:
- Visual difference between imported events and those created directly in Ruoom
- Interaction cues: Hover states showing “View in Google” vs “Edit booking” depending on event source
- Conflict detection: Warn users when booking a room during an existing calendar event
- Sync transparency: Show “Last synced: 10 mins ago” and clear error states if sync fails
Meta Observation
What does this tell you about how the LLM “thinks” about design?
This design reveals an integration-first but UX-second approach. The focus is on successfully pulling and displaying external data, but less attention is paid to the nuances of multi-source calendar management: data ownership, edit permissions, sync conflicts, etc…
Final Question for Jasmina
How would you prompt the LLM to improve the design?
VISUAL REQUIREMENTS:
- Distinguish Google (read-only) vs Ruoom (editable) events
- Handle 10+ events per day gracefully
- Show sync status
USER ACTIONS:
- Click Google event → open in new tab
- Click Ruoom event → inline edit modal
- Toggle to hide/show Google events
EDGE CASES:
- Scheduling conflicts between sources
- Sync failures
- All-day multi-day events
Conclusions
The LLM produced a connected-state calendar that functions well at the surface level: events load, the integration works, and the UI stays visually consistent with Ruoom’s system. But once again, the gaps show up in the places where real UX judgment lives: interaction cues, data ownership clarity, conflict handling, and the subtle distinctions that help users understand what they can do versus what they can only see.
In other words, the model can replicate the frame of a calendar integration, but it doesn’t yet understand the mental models users rely on when dealing with multi-source data. It displayed events, but didn’t differentiate editable vs. read-only. It showed a status badge, but didn’t show sync transparency. It connected accounts, but didn’t anticipate errors, conflicts, or the micro-interactions that build trust.
This is the recurring pattern across our Training Ground reviews: LLMs can assemble working components, but they don’t reason about intent, context, or user expectations the way humans do, especially in systems with permissions, states, or ownership rules.
We’ll be back soon with an analysis of our LLM Prompting Experiment 2. Until then, remember to continue to use your very smart, discerning brain to build and design products. ✌️
