Survey of User Interface Design and Interaction Techniques
in Generative AI Applications
Overview
This case study is a summary of the research paper I published as a Researcher at Adobe. I am the First Author and I worked with a mix of SDEs, Senior Researchers, and Execs over a 5 month timeline. Graphics were purely made in Figma and the paper was written in Latex.
Abstract & Goals
This survey presents a taxonomy of user-guided interaction patterns in generative AI, focusing on explicit user-initiated interactions rather than implicit signals. It aims to provide designers and developers with a comprehensive reference to enhance the usability and accessibility of generative AI applications.
Why did we write this Paper?
• Bridging the Research Gap in HCI: While existing studies broadly explore human-AI interactions, few focus on the specific UI designs and patterns that make these interactions intuitive and effective. We aimed to fill this gap.
• Creating a Practical Reference: By categorizing various user-guided interaction types, we provide a valuable resource for designers and developers looking to understand and implement effective UI patterns for generative AI applications.
• Focusing on User-Guided Interactions: Our survey emphasizes interactions initiated directly by users, avoiding implicit cues and instead showcasing intentional, user-driven engagement techniques.
• Lowering the Barrier to Entry: We aspire to make generative AI design more accessible by offering clear guidelines and examples, helping newcomers and seasoned professionals alike better navigate this evolving field.

Conversational User Interface
A conversational UI is visually designed to mimic a conversation, with a turn-based exchange between the user and AI system.
Focus on GUIs: This discussion centers on graphical conversational interfaces (GUIs), excluding voice-based UIs (VUIs) like Amazon Alexa and Apple Siri.
Primary Interaction Space: Most user interactions happen in the input or prompt box, where users can ask questions or direct tasks.
Secondary Interaction Space: The output or chat history section stores responses, past inputs, or even galleries of outputs, allowing users to review previous exchanges.
Use Cases: Conversational UIs support diverse applications, from generating text to creating 3D models, adapting well to incremental changes during generative processes.
Memory and Context: These UIs excel at recalling previous interactions, helping to inform new outputs based on earlier conversation history, as seen in systems like Flamingo for one-shot tasks.
Versatility: Overall, conversational UIs are adaptable and can perform a wide range of tasks efficiently.
Canvas User Interface
Canvas-focused user interfaces center interactions on a primary canvas, with generative tools placed around the periphery.
Content Canvases: These UIs display content such as images, text, code, or data visualizations on a central canvas, with secondary interactions occurring on the periphery, as exemplified by DeepThInk, an AI art therapy tool.
Information Visualization Canvases: These canvases allow users to manipulate visual elements that represent input-output interactions, aiding in data comprehension, such as in Graphologue, which visualizes travel itineraries as hierarchical treemaps.
Cognitive Load Reduction: Information visualization canvases help simplify complex data, breaking it into digestible segments to reduce user cognitive load.
Contextual User Interface
Contextual user interfaces integrate generative interactions directly within the main content area, where the user is most likely focused.
In-Line Generative Actions: Unlike canvas UIs, contextual UIs provide generative actions directly in-line rather than in the periphery, often appearing unprompted as a response to user actions.
Adaptive Interaction: Contextual UIs often adapt to user behavior, displaying relevant prompts or actions in response to specific user interactions within the subject area.
Seamless User Experience: By embedding interactions within the context, these UIs create a seamless experience that keeps users engaged without needing to navigate away from their focus.
Cognitive Load Reduction: This UI type lowers cognitive load by displaying interactions within the user’s immediate focus area, making it easier to access relevant information efficiently.
Modular User Interface
Modular user interfaces are structured with multiple main interaction areas, each serving a distinct function within the generative process.
Multi-Level Generation: These UIs are especially effective in systems with multiple levels of generation, as each module can handle a different aspect of the interaction.
Function-Specific Modules: Each module is tailored to a specific task, allowing for refinement at different stages, which enhances the overall precision and relevance of the generative output.
Versatility: Modular UIs are highly versatile, capable of supporting complex interactions and multi-functional generative systems.
Design Recommendation: Modular UIs are well-suited for designing interfaces for multi-level language models and complex generative AI applications, offering structured yet flexible interaction spaces.
Simulated User Interface
Simulated user interfaces enable interactions with generative systems in virtual or augmented reality environments.
Alternative to Traditional GUIs: These interfaces are ideal when traditional GUIs are insufficient for task completion, offering a more immersive solution.
Training and Simulation: Simulated UIs can be used to teach users specific tasks in a controlled, virtual setting, enhancing skill development.
Tangible Data Interaction: They allow users to engage with data in a physical or spatial way, adding a hands-on component that enhances understanding.
Enhanced Learning and Visualization: Although complex to create, simulated UIs are valuable for training and visualizing concepts in real-time, especially in fields like urban planning or interactive learning.

Input Modalities
Users can utilize different primary input modalities—text-based, visual, and sound inputs— to interact with generative AI systems, defining inputs as data or information processed by the AI to generate outputs.
Text-Based Inputs: Text inputs in generative AI encompass natural language, data, and code, allowing users to interact with the system in diverse ways, from creating stories to synthesizing information and debugging code.
Visual Inputs: Visual inputs, i.e. images, videos, and virtual gestures, serve as versatile input types, used for generating new visuals, captioning, editing, and creating accessibility elements. Visual interactions also include gestures in virtual or augmented reality, enhancing immersive experiences.
Sound Inputs: Sound-based inputs, including speech and audio recordings, allow users to interact with generative systems in novel ways, such as completing speeches or creating videos that match audio cues.
Input Flexibility: Generative AI systems support various input modalities, making them adaptable for different user needs and applications, from text-based storytelling to real-time visual and auditory interactions.
Enhanced User Experience: By leveraging multiple input types, generative AI systems provide a richer, more personalized interaction experience, catering to both professional and creative use cases.
Prompting
Prompting in generative AI involves users explicitly directing the system to perform specific tasks, rather than an input that is content like images or videos. For example, in video editing, the video is the input, while a prompt like “edit the video to be 10 seconds” guides the system’s action on that input.
Text-Based Prompts: Text-based prompts consist of using written text, often in the form of natural language, to prompt the system to complete a certain task.
Visual Prompts: Visual prompting consists of using visual communication, like gestures, to prompt the system to complete a certain task.
Audio Prompts: Audio prompting consists of using speech or any other type of audio to prompt the system to complete a certain task.
Multimodal Prompts: Multimodal prompting consists of using a mix of the previous methods to prompt the system to complete a certain task.
Selection Techniques
In generative AI, selecting refers to choosing or highlighting specific UI elements or content for further interaction, a key method in user interactions with such systems. Selection tools like boxes, lassos, and dropdown menus enable users to interact with and control AI-generated content with precision and accuracy.
Single Selection: A single-selection interaction consists of clicking or choosing a single GUI element that will be interacted with further. An example would be a user choosing one of 3 outputs that they wish to iterate on further.
Multi-Selection: A multi-selection interaction consists of clicking or choosing multiple UI elements that will be interacted with further.
Lasso and Brush Selection: Lasso and brush selections are selection techniques where a lasso or brush is used to create a bounding box that controls the region where a specific prompt is applied
System and Parameter Manipulation
The System and parameter manipulation category includes user interactions that let users adjust settings, parameters, or functions within a generative AI system to personalize outputs. Examples of these interactions include menus, sliders, and explicit feedback, which help users tailor the system’s responses to their needs.
Menus: when a user either inputs their own parameters or chooses from preset options to change the parameters of the generative process. An example would be a user using dropdown menu options to alter the output parameters.
Sliders: A UI element that can be "slid" to adjust the parameters of the generative AI system.
Explicit Feedback: Explicit feedback (i.e., thumbs up/down, written critiques, etc.) is a user interaction that is used to expressly personalize the system to the user’s preferences.
Object Manipulation & Transformation
Object manipulation and transformation interactions allow users to directly modify, adjust, or transform specific UI elements, providing greater control over the system. Examples include dragging and dropping UI elements and resizing elements within the interface, enabling unique and flexible interactions.
Drag and Drop: Moving an element to a specific location or in a way that manipulates the generative system.
Connecting: In connecting interactions, a user stacks and connects two UI elements in a way that affects the overall generative process.
Resizing: Altering the size of a UI element in a way that changes its function in the generative process.

Human-AI Engagement Taxonomy
In this section, we present a taxonomy of human-GenAI interaction levels, progressing from minimal to more active user engagement.
Engagement is defined as the process by which participants initiate, maintain, and conclude their perceived connections during an interaction.
We propose a spectrum of engagement levels ranging from passive to fully collaborative, with main categories including passive engagement, deterministic engagement, assistive engagement, sequential collaborative engagement, and simultaneous collaborative engagement. The level of engagement influences the application scenarios that the generative AI system can support and the interaction techniques it offers.
Passive Engagement
• Passive engagement systems generate content autonomously based on implicit user data, such as behavior patterns and preferences, without direct user input.
• Success Factors: Integrates seamlessly into user routines, with minimal interaction required, and effectively personalizes content based on passive cues.
• Examples: Social media engagement analytics, predictive healthcare models, personalized news curation, and tailored design recommendations.
My major takeaways
This project was influential in my growth as a researcher because I was given the opportunity to be a first author on an influential research paper in a growing area of Artificial Intelligence.
Deepening Understanding: Through this research, I explored the diverse and complex ways users interact with generative AI, learning how thoughtful design can transform user experiences with advanced technology.
Impacting Future Design: My goal is to help designers create AI systems that feel accessible and empowering for all users, and this paper serves as a foundational guide for those aiming to design with inclusivity and usability in mind.
Inspiring Further Research: I hope this work sparks further exploration and refinement in generative AI design, contributing to a future where AI applications are both powerful and easy for anyone to use.