Survey of User Interface Design and Interaction Techniques

in Generative AI Applications

Human-Computer Interaction (HCI) Research Paper Summary

Human-Computer Interaction (HCI)
Research Paper Summary

Human-Computer Interaction
(HCI) Research Paper Summary

Link to full Paper

Overview

This case study is a summary of the research paper I published as a Researcher at Adobe. I am the First Author and I worked with a mix of SDEs, Senior Researchers, and Execs over a 5 month timeline. Graphics were purely made in Figma and the paper was written in Latex.

Abstract & Goals

This survey presents a taxonomy of user-guided interaction patterns in generative AI, focusing on explicit user-initiated interactions rather than implicit signals. It aims to provide designers and developers with a comprehensive reference to enhance the usability and accessibility of generative AI applications.

Project Background

Project

Background

Why did we write this Paper?

• Bridging the Research Gap in HCI: While existing studies broadly explore human-AI interactions, few focus on the specific UI designs and patterns that make these interactions intuitive and effective. We aimed to fill this gap.

• Creating a Practical Reference: By categorizing various user-guided interaction types, we provide a valuable resource for designers and developers looking to understand and implement effective UI patterns for generative AI applications.

• Focusing on User-Guided Interactions: Our survey emphasizes interactions initiated directly by users, avoiding implicit cues and instead showcasing intentional, user-driven engagement techniques.

• Lowering the Barrier to Entry: We aspire to make generative AI design more accessible by offering clear guidelines and examples, helping newcomers and seasoned professionals alike better navigate this evolving field.

How are People using AI?

(Use Cases)

How are People using AI? (Use Cases)

How are People using AI?

(Use Cases)

Content Creation

Overview and Examples: Generative AI in content creation enables users to generate or edit visual, written, and audio media with specific parameters, significantly reducing barriers for non-experts. Tools like Tang et al.’s RealFill illustrate how these systems enhance creativity by allowing users to quickly and effectively produce or modify content, saving time and increasing accessibility in creative fields.

Common User Interactions: Content creation platforms vary widely, but certain UI layouts are more common, as identified through our literature review. For instance, conversational UIs are particularly effective for generating written content and images, while canvas UIs excel in visual content creation, often emphasizing user-guided techniques, especially prompting, which remains prevalent across platforms.

Data Analysis & Forecasting

Overview and Examples: Generative AI holds significant potential in data analysis and forecasting by aiding experts in extracting insights from increasingly large and valuable datasets. It not only helps in digesting and visualizing data for easier understanding but also enhances the speed and efficiency of data-driven decision-making.

Common User Interactions: Data analysis and forecasting platforms in generative AI come in varied forms, with several common themes emerging from our survey. Most data-focused generative systems use information visualization canvases, modular UIs, or conversational interfaces.

Content Creation

Research & Development

Overview and Examples: Generative AI is transforming research and development by enabling personalized learning environments for complex topics and streamlining traditionally time-consuming tasks. It enhances efficiency, allowing researchers and developers to focus more on critical work rather than routine tasks.

Common User Interactions: Research and development systems employ a variety of user interaction techniques, with several common design patterns emerging across applications. These systems often utilize conversational UIs, which align well with the question-and-answer format typical in research, while contextual UIs provide relevant edits and suggestions based on the user’s specific development or writing context.

Data Analysis & Forecasting

Task Automation

Overview and Examples: Task automation is a powerful application of generative AI, enabling the automation of repetitive yet often high-skill tasks. By increasing efficiency and reducing menial work, generative AI allows users to focus on larger objectives and decision-making.

Common User Interactions: Given that task automation minimizes human interaction by design, it involves limited user-guided interactions. Typically, task automation uses conversational UIs for user-initiated actions, with prompting as the primary interaction to initiate automation processes.

Research & Development

Personal Assistance

Overview and Examples: One of generative AI’s most powerful features is its capacity to provide personalized assistance tailored to individual users’ needs. Often functioning as a chatbot, it delivers customized information, advice, and even professional-level support in areas like customer service, editing, and basic medical or legal advice, making expert guidance more accessible to a broader audience.

Common User Interactions: Personal assistance applications predominantly use conversational UIs, as these align well with the interactive nature of personal assistance. While prompting is the primary user-guided technique, some systems also use contextual UIs to deliver real-time, in-context support tailored to the user’s current tasks.

Task Automation

Personal Assistance

Common User Interface Layouts

Conversational User Interface

A conversational UI is visually designed to mimic a conversation, with a turn-based exchange between the user and AI system.

Focus on GUIs: This discussion centers on graphical conversational interfaces (GUIs), excluding voice-based UIs (VUIs) like Amazon Alexa and Apple Siri.
Primary Interaction Space: Most user interactions happen in the input or prompt box, where users can ask questions or direct tasks.
Secondary Interaction Space: The output or chat history section stores responses, past inputs, or even galleries of outputs, allowing users to review previous exchanges.
Use Cases: Conversational UIs support diverse applications, from generating text to creating 3D models, adapting well to incremental changes during generative processes.
Memory and Context: These UIs excel at recalling previous interactions, helping to inform new outputs based on earlier conversation history, as seen in systems like Flamingo for one-shot tasks.
Versatility: Overall, conversational UIs are adaptable and can perform a wide range of tasks efficiently.

Canvas User Interface

Canvas-focused user interfaces center interactions on a primary canvas, with generative tools placed around the periphery.

Content Canvases: These UIs display content such as images, text, code, or data visualizations on a central canvas, with secondary interactions occurring on the periphery, as exemplified by DeepThInk, an AI art therapy tool.
Information Visualization Canvases: These canvases allow users to manipulate visual elements that represent input-output interactions, aiding in data comprehension, such as in Graphologue, which visualizes travel itineraries as hierarchical treemaps.
Cognitive Load Reduction: Information visualization canvases help simplify complex data, breaking it into digestible segments to reduce user cognitive load.

Contextual User Interface

Contextual user interfaces integrate generative interactions directly within the main content area, where the user is most likely focused.

In-Line Generative Actions: Unlike canvas UIs, contextual UIs provide generative actions directly in-line rather than in the periphery, often appearing unprompted as a response to user actions.
Adaptive Interaction: Contextual UIs often adapt to user behavior, displaying relevant prompts or actions in response to specific user interactions within the subject area.
Seamless User Experience: By embedding interactions within the context, these UIs create a seamless experience that keeps users engaged without needing to navigate away from their focus.
Cognitive Load Reduction: This UI type lowers cognitive load by displaying interactions within the user’s immediate focus area, making it easier to access relevant information efficiently.

Modular User Interface

Modular user interfaces are structured with multiple main interaction areas, each serving a distinct function within the generative process.

Multi-Level Generation: These UIs are especially effective in systems with multiple levels of generation, as each module can handle a different aspect of the interaction.
Function-Specific Modules: Each module is tailored to a specific task, allowing for refinement at different stages, which enhances the overall precision and relevance of the generative output.
Versatility: Modular UIs are highly versatile, capable of supporting complex interactions and multi-functional generative systems.
Design Recommendation: Modular UIs are well-suited for designing interfaces for multi-level language models and complex generative AI applications, offering structured yet flexible interaction spaces.

Simulated User Interface

Simulated user interfaces enable interactions with generative systems in virtual or augmented reality environments.

Alternative to Traditional GUIs: These interfaces are ideal when traditional GUIs are insufficient for task completion, offering a more immersive solution.
Training and Simulation: Simulated UIs can be used to teach users specific tasks in a controlled, virtual setting, enhancing skill development.
Tangible Data Interaction: They allow users to engage with data in a physical or spatial way, adding a hands-on component that enhances understanding.
Enhanced Learning and Visualization: Although complex to create, simulated UIs are valuable for training and visualizing concepts in real-time, especially in fields like urban planning or interactive learning.

What Modalities are being used?

Input Modalities

Users can utilize different primary input modalities—text-based, visual, and sound inputs— to interact with generative AI systems, defining inputs as data or information processed by the AI to generate outputs.

Text-Based Inputs: Text inputs in generative AI encompass natural language, data, and code, allowing users to interact with the system in diverse ways, from creating stories to synthesizing information and debugging code.
Visual Inputs: Visual inputs, i.e. images, videos, and virtual gestures, serve as versatile input types, used for generating new visuals, captioning, editing, and creating accessibility elements. Visual interactions also include gestures in virtual or augmented reality, enhancing immersive experiences.
Sound Inputs: Sound-based inputs, including speech and audio recordings, allow users to interact with generative systems in novel ways, such as completing speeches or creating videos that match audio cues.
Input Flexibility: Generative AI systems support various input modalities, making them adaptable for different user needs and applications, from text-based storytelling to real-time visual and auditory interactions.
Enhanced User Experience: By leveraging multiple input types, generative AI systems provide a richer, more personalized interaction experience, catering to both professional and creative use cases.

General Interaction Techniques

Prompting

Prompting in generative AI involves users explicitly directing the system to perform specific tasks, rather than an input that is content like images or videos. For example, in video editing, the video is the input, while a prompt like “edit the video to be 10 seconds” guides the system’s action on that input.

Text-Based Prompts: Text-based prompts consist of using written text, often in the form of natural language, to prompt the system to complete a certain task.
Visual Prompts: Visual prompting consists of using visual communication, like gestures, to prompt the system to complete a certain task.
Audio Prompts: Audio prompting consists of using speech or any other type of audio to prompt the system to complete a certain task.
Multimodal Prompts: Multimodal prompting consists of using a mix of the previous methods to prompt the system to complete a certain task.

Selection Techniques

In generative AI, selecting refers to choosing or highlighting specific UI elements or content for further interaction, a key method in user interactions with such systems. Selection tools like boxes, lassos, and dropdown menus enable users to interact with and control AI-generated content with precision and accuracy.

Single Selection: A single-selection interaction consists of clicking or choosing a single GUI element that will be interacted with further. An example would be a user choosing one of 3 outputs that they wish to iterate on further.
Multi-Selection: A multi-selection interaction consists of clicking or choosing multiple UI elements that will be interacted with further.
Lasso and Brush Selection: Lasso and brush selections are selection techniques where a lasso or brush is used to create a bounding box that controls the region where a specific prompt is applied

System and Parameter Manipulation

The System and parameter manipulation category includes user interactions that let users adjust settings, parameters, or functions within a generative AI system to personalize outputs. Examples of these interactions include menus, sliders, and explicit feedback, which help users tailor the system’s responses to their needs.

Menus: when a user either inputs their own parameters or chooses from preset options to change the parameters of the generative process. An example would be a user using dropdown menu options to alter the output parameters.
Sliders: A UI element that can be "slid" to adjust the parameters of the generative AI system.
Explicit Feedback: Explicit feedback (i.e., thumbs up/down, written critiques, etc.) is a user interaction that is used to expressly personalize the system to the user’s preferences.

Object Manipulation & Transformation

Object manipulation and transformation interactions allow users to directly modify, adjust, or transform specific UI elements, providing greater control over the system. Examples include dragging and dropping UI elements and resizing elements within the interface, enabling unique and flexible interactions.

Drag and Drop: Moving an element to a specific location or in a way that manipulates the generative system.
Connecting: In connecting interactions, a user stacks and connects two UI elements in a way that affects the overall generative process.
Resizing: Altering the size of a UI element in a way that changes its function in the generative process.

How do People Engage with AI?

Human-AI Engagement Taxonomy

In this section, we present a taxonomy of human-GenAI interaction levels, progressing from minimal to more active user engagement.

Engagement is defined as the process by which participants initiate, maintain, and conclude their perceived connections during an interaction.

We propose a spectrum of engagement levels ranging from passive to fully collaborative, with main categories including passive engagement, deterministic engagement, assistive engagement, sequential collaborative engagement, and simultaneous collaborative engagement. The level of engagement influences the application scenarios that the generative AI system can support and the interaction techniques it offers.

Passive Engagement

• Passive engagement systems generate content autonomously based on implicit user data, such as behavior patterns and preferences, without direct user input.

• Success Factors: Integrates seamlessly into user routines, with minimal interaction required, and effectively personalizes content based on passive cues.

• Examples: Social media engagement analytics, predictive healthcare models, personalized news curation, and tailored design recommendations.

Deterministic Engagement

• Deterministic engagement involves minimal user interaction, as the AI system operates independently based on preset parameters to fulfill specific tasks.

• Success Factors: Automates tasks with limited user input, typically only requiring a START or STOP command, and efficiently completes predefined objectives within established parameters.

• Examples: News gathering, chemical synthesis automation, and educational content generation.

Deterministic Engagement

• Deterministic engagement involves minimal user interaction, as the AI system operates independently based on preset parameters to fulfill specific tasks.

• Success Factors: Automates tasks with limited user input, typically only requiring a START or STOP command, and efficiently completes predefined objectives within established parameters.

• Examples: News gathering, chemical synthesis automation, and educational content generation.

Assistive Engagement

• Assistive engagement offers indirect assistance to users such as making suggestions. Assistive engagement provides indirect support to users, helping refine work without taking over the creative process.

• Success Factors: Enables users to maintain control while receiving AI-generated guidance, ideal for creative or academic tasks where user autonomy is essential.

• Examples: Auto-completion for code, question assistance, focus-enhancement tools, and design suggestions.

Assistive Engagement

• Success Factors: Enables users to maintain control while receiving AI-generated guidance, ideal for creative or academic tasks where user autonomy is essential.

• Examples: Auto-completion for code, question assistance, focus-enhancement tools, and design suggestions.

Turn-based Collaborative Engagement

• Turn-based collaborative engagement involves a sequential, turn-taking interaction where users and AI work together towards a shared goal, alternating inputs to produce a final product.

• Success Factors: Relies on effective information exchange and user expertise to guide AI inputs, fostering a conversational style of collaboration for achieving cohesive results.

• Examples: Co-creating data visualizations, editing visual content, and collaboratively writing stories.

Turn-based Collaborative Engagement

• Turn-based collaborative engagement involves a sequential, turn-taking interaction where users and AI work together towards a shared goal, alternating inputs to produce a final product.

• Success Factors: Relies on effective information exchange and user expertise to guide AI inputs, fostering a conversational style of collaboration for achieving cohesive results.

• Examples: Co-creating data visualizations, editing visual content, and collaboratively writing stories.

Turn-Based Collaborative Engagement

Simultaneous Collaborative Engagement

In simultaneous collaborative engagement, the user and AI work concurrently on a task, allowing real-time collaboration and immediate impact on shared outputs.

• Success Factors: Enables dynamic, real-time interaction, allowing user and AI contributions to influence each other directly, enhancing accuracy and creative outcomes through continuous concurrent input.

• Examples: Real-time image editing where the user and AI adjust different parts of the image simultaneously, human-AI drawing systems, and multi-agent setups where multiple AIs refine each other’s outputs.

Simultaneous Collaborative Engagement

In simultaneous collaborative engagement, the user and AI work concurrently on a task, allowing real-time collaboration and immediate impact on shared outputs.

Open Problems in AI Design

Accessibility

Designing for Disabilities

Revolutionizing Interaction: Generative AI can transform how disabled users engage with technology, assisting with tasks like writing and identifying meeting participants.
Supporting Neurodivergent Users: AI tools like ChatGPT or Gemini can help neurodivergent individuals adjust tone in workplace communications.
Current Accessibility Gaps: Many disabled users need assistance from others to verify that AI systems understand their queries accurately, highlighting usability issues.
Inclusive Design Need: Future AI interfaces should involve disabled users in the design process to address accessibility needs, as research on accessible generative AI remains limited.

Designing for Disabilities

Revolutionizing Interaction: Generative AI can transform how disabled users engage with technology, assisting with tasks like writing and identifying meeting participants.
Supporting Neurodivergent Users: AI tools like ChatGPT or Gemini can help neurodivergent individuals adjust tone in workplace communications.
Current Accessibility Gaps: Many disabled users need assistance from others to verify that AI systems understand their queries accurately, highlighting usability issues.
Inclusive Design Need: Future AI interfaces should involve disabled users in the design process to address accessibility needs, as research on accessible generative AI remains limited.

Designing for Limited Technical Literacy

Reducing Barriers: Generative AI can make technology more accessible, opening fields like data science and writing to individuals with limited access or expertise.
Challenges with Technical Literacy: Many users lack the technical knowledge needed to fully utilize AI applications, often leading to frustration or abandonment.
Design Complexity: Some AI interfaces are overly complicated, favoring users who already have technical understanding, which limits accessibility.
Inclusive UI Design: Future AI interfaces should prioritize educational elements and simplify discoverability to cater to a wide range of technical literacy levels.

Designing for Limited Technical Literacy

Reducing Barriers: Generative AI can make technology more accessible, opening fields like data science and writing to individuals with limited access or expertise.
Challenges with Technical Literacy: Many users lack the technical knowledge needed to fully utilize AI applications, often leading to frustration or abandonment.
Design Complexity: Some AI interfaces are overly complicated, favoring users who already have technical understanding, which limits accessibility.
Inclusive UI Design: Future AI interfaces should prioritize educational elements and simplify discoverability to cater to a wide range of technical literacy levels.

Future of Generative AI Design

Designing for Future User Interfaces

Growth in New Interfaces: Generative AI is expanding into virtual and tangible interfaces, requiring new UI design approaches for these evolving technologies.
Three-Dimensional Interaction Challenges: Designing for 3D applications demands different strategies than traditional 2D interfaces, prompting the need for further research.
Emergence of Multi-Agent Systems: As multi-agent generative AI systems become common, interfaces must clearly convey which agents are acting and when.
Future Research Needs: Continued study of emerging generative AI technologies will clarify how to optimize user interactions for evolving interfaces.

Designing for Future User Interfaces

Growth in New Interfaces: Generative AI is expanding into virtual and tangible interfaces, requiring new UI design approaches for these evolving technologies.
Three-Dimensional Interaction Challenges: Designing for 3D applications demands different strategies than traditional 2D interfaces, prompting the need for further research.
Emergence of Multi-Agent Systems: As multi-agent generative AI systems become common, interfaces must clearly convey which agents are acting and when.
Future Research Needs: Continued study of emerging generative AI technologies will clarify how to optimize user interactions for evolving interfaces.

Designing for Growth and Scalability

Anticipated Growth: Generative AI is expected to expand significantly, necessitating adaptable interfaces to support a wider variety of users and applications.
Interaction Consistency: Maintaining consistent interaction patterns is essential as applications grow, minimizing the need for users to relearn the interface.
Managing Complexity: As features and capabilities increase, interfaces must remain simple to prevent cognitive overload for users.
Balancing Growth and Usability: Designers should align the evolving capabilities of AI systems with user-friendly, straightforward interfaces to meet user needs effectively.

Designing for Growth and Scalability

Anticipated Growth: Generative AI is expected to expand significantly, necessitating adaptable interfaces to support a wider variety of users and applications.
Interaction Consistency: Maintaining consistent interaction patterns is essential as applications grow, minimizing the need for users to relearn the interface.
Managing Complexity: As features and capabilities increase, interfaces must remain simple to prevent cognitive overload for users.
Balancing Growth and Usability: Designers should align the evolving capabilities of AI systems with user-friendly, straightforward interfaces to meet user needs effectively.

Ethics

Designing for Harmful Bias Mitigation

Bias Inheritance: Generative AI applications often inherit biases from their training data, leading to harmful stereotypes and skewed outputs.
Design Opportunity: There is a need for user interaction designs that actively mitigate bias in generative AI, as most current interfaces lack bias-awareness features.
Transparency Features: Future UI designs could improve by incorporating features that reveal potential biases in AI outputs, promoting user awareness.
Collaborative Mitigation: Research could explore interactive techniques that allow users to flag biased outputs, fostering a collaborative approach to reduce harmful bias.

Designing for Harmful Bias Mitigation

Bias Inheritance: Generative AI applications often inherit biases from their training data, leading to harmful stereotypes and skewed outputs.
Design Opportunity: There is a need for user interaction designs that actively mitigate bias in generative AI, as most current interfaces lack bias-awareness features.
Transparency Features: Future UI designs could improve by incorporating features that reveal potential biases in AI outputs, promoting user awareness.
Collaborative Mitigation: Research could explore interactive techniques that allow users to flag biased outputs, fostering a collaborative approach to reduce harmful bias.

Designing to Prevent Misuse

Dual Potential: Generative AI can either empower users in complex fields or be misused, leading to risks like misinformation and plagiarism.
Data Ethics: Generative AI interfaces should prioritize data transparency, allowing users to understand and control what personal information is collected and how it’s used, through clear interaction features and educational flows..
Preventive Warnings: Including UI warnings for potentially harmful queries can help users understand misuse risks, enhancing system accountability.
Proactive Protections: Embedding protective features in AI interfaces could help curb misuse before it happens, promoting responsible AI interactions.

Designing to Prevent Misuse

Dual Potential: Generative AI can either empower users in complex fields or be misused, leading to risks like misinformation and plagiarism.
Data Ethics: Generative AI interfaces should prioritize data transparency, allowing users to understand and control what personal information is collected and how it’s used, through clear interaction features and educational flows..
Preventive Warnings: Including UI warnings for potentially harmful queries can help users understand misuse risks, enhancing system accountability.
Proactive Protections: Embedding protective features in AI interfaces could help curb misuse before it happens, promoting responsible AI interactions.

Wrapping Up

My major takeaways

This project was influential in my growth as a researcher because I was given the opportunity to be a first author on an influential research paper in a growing area of Artificial Intelligence.

Deepening Understanding: Through this research, I explored the diverse and complex ways users interact with generative AI, learning how thoughtful design can transform user experiences with advanced technology.
Impacting Future Design: My goal is to help designers create AI systems that feel accessible and empowering for all users, and this paper serves as a foundational guide for those aiming to design with inclusivity and usability in mind.
Inspiring Further Research: I hope this work sparks further exploration and refinement in generative AI design, contributing to a future where AI applications are both powerful and easy for anyone to use.