CONTEXT-BUILT AI DEVICES: real world constraints and AI opportunity seeking

Dave Evans - January 2024

EXECUTIVE SUMMARY

Technical and social constraints imply AI agents and host devices must be built around their real-world contexts:

collecting the sound, text, and image data that feeds AI systems carries significant technical overhead and social responsibilities; these constraints imply different implementations of AI will be best suited for use in different contexts, namely:

proactive, general purpose AI agents should be situated in spaces with bounded social constructs and clear utility for their users;
mobile, personal AI agents should have their utility limited to specific, on-demand uses that respect common-denominator social constructs around data collection.

Situated, general purpose AI agents are best suited to supporting a bounded set of tasks in well understood real-world spaces because

they may utilize permanently installed infrastructure to support significant technical demands for sensors, computation, storage, and connectivity,
they may function within a well defined legal framework,
they may function within a shared and bounded social construct,
they may align to a clear set of tasks to better facilitate AI & human collaboration.

Mobile, personal AI agents are are best suited to convivially optimizing the existing capabilities of our smartphones and their accessories:

they may help sandbox entertainment usage and distractions,
they may streamline on-demand productivity tasks spread across apps and the web,
they may adapt device automations to user behaviors and expressed preferences,
they should behave in alignment with clear brand values and philosophical intent.

_______________________

ESSAY

The recent hype around the potency of artificial intelligences, especially those powered by transformers such as large language models, has led to a commercial race to develop new AI-powered computing devices and platforms. This essay intends to discuss the constraints facing the development of any novel AI platform or device in order to better inform and align product development and exploration.

Specifically I intend to argue that, in the near term, AI’s and AI hosting devices that will be readily adopted will either be:

Task-oriented AI systems that are embodied into specific environments, supporting a relatively narrow class of productive tasks with a narrow group of people, or
Significant user interface improvements for our existing personal mobile devices with limited opportunities to expand the practical capabilities of those devices.

Onwards.

SENSE • PLAN • ACT

While the capabilities of contemporary AI’s are unprecedented, the field of robotics and AI development is decades old and has a robust literature of experiments and theory. The original roboticists popularized the idea that a robot was any system that could “sense, plan, act” in a self contained fashion. While modern AI approaches have moved past the SPA framework on a technical basis it remains a useful simplification: for an AI agent to be contextually useful it must sense knowledge of a given opportunity space, plan useful actions therein, and take action accordingly.

When placed in a context with a purpose, robots were dubbed “embodied agents.” This term came to encompass not only factory robots but chatbots on the web as well. Devices placed into a highly specific environment were dubbed “situated robots.” As we consider emerging market and technological trends in AI it’s likely that novel AI products will fall into these categories. (This document does not intend to review the literature on what makes a compelling embedded agent or a useful situated robot, but it may be worth reviewing.)

Pioneering AI researcher Nils Nilsson posited, in 1982, that 4 major fields of AI development would be necessary to lay the foundations for useful systems, “not necessarily to achieve anything useful as an application per se.” Those areas of research were:

natural-language processing,

computer vision,

expert systems,
problem-solving.

Gathering Useful Data

Approaches to these topics shifted around the time of AI researcher Peter Norvig’s 2001 paper The Unreasonable Effectiveness of Data argues that in machine learning “simple models and a lot of data trump more elaborate models based on less data.” This approach profoundly influenced the building of today’s foundation models which have been fed vast sums of data gleaned from the internet and scanned archives. This has resulted in Nilsson’s “expert systems and problem solving” being addressed merely by wielding the power of NLP and CV to digest the awesome corpus of knowledge on the internet.

Consequently the dominant AI foundation models are based on data fed by:

Text via Natural Language Processing (ChatGPT, LaMDA, LLaMA, Claude, etc.)
Images via Computer Vision (Dall•E, Midjourney, Stable Diffusion, etc.)

These capabilities can be mixed and matched with other machine learning models to extend the AI’s utility, such as audio processing to enable speech recognition. Thus we can assume that a modern AI is capable of responding to multivariate data, provided that it eventually maps to its core training data of text and images. (Note that while AI’s may be custom built around any data type, such as radar, the dominant general purpose AIs will likely remain text and photographic image based merely given the basic composition of the internet and the outputs of our computing devices.)

Thus the sensory inputs built into new AI devices should focus on:

Keyboards (text via user input)
Microphones (text via speech)
Cameras (images)

While this seems like a naively obvious conclusion on a fundamental technical basis, the design of AI devices as products must balance further constraints. For while we may follow Norvig’s advice and merely feed our AIs “a lot of data” on a quantitative basis, it is prudent to recognize that AIs are still computers subject to “garbage in, garbage out.”

Useful AIs must focus on sensing data that is relevant and useful for users. This demands a qualitative understanding of an AI’s intended application. For discussion let’s classify an AI’s qualitative capabilities against two simple criteria: its specificity and its agency.

specificity drives sensing resolution

The specificity of AI powered devices range from specialized tools like factory robots to true AGIs that are capable of nearly anything. The functional utility of a given AI may be optimized to suit the specific boundaries of a selected problem or market opportunity.

If we assume modern AI’s have a limited, if enormous, range of capabilities then an AI’s utility may be bounded primarily by its range of sensory data. Inputs bound outputs.

Thus the narrower the intended utility the more focused the sensory data collection can be. Some tasks and environments readily bound the quality and quantity of data required for an AI to be useful. Data breadth and detail can be determined by task relevance.

Broadly useful, more general AI’s are by definition unbounded. Any data that can be fed into a given AI model may indeed be useful, making discernment nearly impossible. For general AI applications “a lot” of data is always better.

Narrow AIs require less data resolution.

Broad AIs require more data resolution.

agency drives action rate

The efficacy of an AI is also characterized by its degree of independent agency. AIs may be reactive, taking action only when clearly instructed by well defined circumstances. AI’s may also be proactive, empowered to take action as it determines is useful.

Reactive AI’s minimize data collection and processing as they are fundamentally fed ad-hoc inputs and intentions on a case-by-case basis.

Proactive AI’s maximize data collection and processing as they are fundamentally open-ended and require constant streams of data to maintain their useful readiness.

Reactive AIs require low data rates.

Proactive AIs require high data rates.

specificity & agency drive computing demands

Since AI’s are computers any product or service must process and store collected data using some combination of on-device (local) or connected (cloud) computing. While advancements are being made in AI efficiency it can generally be assumed that AI’s will be computationally expensive to operate. Thus devices that host AI’s must balance local processing power and energy sources against the ability to connect to external computational resources.

While each AI device will demand a particular balance of local vs cloud computation, it can generally be understood that devices that collect data at higher data resolutions & rates will necessarily demand more computing power as well.

Specific, reactive AI’s are data & computing light.

General, proactive AI’s are data & computing heavy.

Applications & Constraints

For maximally potent agents, more data is better. Yet data collection and processing carries technical and social burdens that limit the utility of AI devices in the real world. Cameras and microphones are subject to social stigmas and legal boundaries. Devices are subject to ergonomics and social aesthetics. It is technically impossible and socially inappropriate to assume devices that indiscriminately capture sound and images will be feasible to build or desirable to use.

AI device builders must design around a thoughtfully bounded application, to build for a considered balance of agency and generality. The combination of these qualities will lend themselves to abstracted categories of AI products. Acknowledging that in practice the axes of generality (specific vs general) and agency (proactive vs reactive) lie on a spectrum, it’s instructive to consider them in a 2x2 matrix to compare the extreme combinations.

Specific, reactive AIs = Tools.

Specific, proactive AIs = Robots.

General, reactive AIs = Assistants.

General, proactive AIs = Intelligent Agents.

[Note: Here I originally drew a 2x2 but I can’t make squarespace’s blog format insert a table.]

AI Generality: Specific <—> General VS. AI Agency: Reactive <—> Proactive

ROBOTS: specific, proactive AI’s represent robots with defined utility and autonomous operation:

smart thermostats
self-driving cars
Roombas

TOOLS: specific, reactive AI’s represent tools with a purposeful utility that respond to specific prompts:

factory robots
MSFT Co-Pilot
Midjourney

INTELLIGENT AGENTS: general, proactive AI’s represent our hoped for future but are currently only in research labs and fiction:

sci-fi characters
Samsung Ballie?
???

ASSISTANTS: general, reactive AI’s represent assistive devices and services that still require inputs to direct and initiate:

Alexa & Siri
ChatGPT & Google
Humane & Rabbit

technical constraints on data

Intelligent agents demand maximal data collection and processing across the widest range of possible data types and sources. Yet broadband data capture is quite technically expensive, requiring a significant investment in sensor suites, processing power, data storage, connectivity, and electrical power. Maximizing data gathering also maximizes infrastructure and energy requirements.

The devices and systems that can host computationally expensive AIs – namely the general, proactive Intelligent Agents that tech companies are racing to develop – may only be embodied in systems that are significant infrastructure. These are likely to be permanently installed systems, plugged into the electrical grid and wired to the internet.

The inverse is surely also true, that locally hosted AIs must be built around the power capacities and processing capabilities of individual devices. Smaller devices will necessarily offer less utility, leaning towards AI systems that are more specific and reactive as a baseline, with application specific leanings towards more agency or more generality. Thus compact, standalone AI systems are likely to be Tools that look like Assistants or Robots depending on the circumstance or application.

Intelligent Agents will be dependent on installed infrastructure.

Tools, Robots, and Assistants may function as standalone, portable systems.

social constraints on data

Digital recording of our lives is socially unacceptable in a huge range of personal and shared contexts. Both highly public spaces and highly private spaces carry legitimate concerns around data privacy, security, and consent that obviate the practical adoption of comprehensive, indiscriminate data collection. Furthermore these social limitations vary widely across the globe; a user in the USA has vastly different expectations around digital privacy than Europeans who more jealously guard their data. Even individuals will have widely varying comfort with personal data collection over the course of their day to day activities and spaces.

The myriad particulars of physical environments, cultural settings, and interpersonal dynamics will dictate what data collection is considered appropriate in each circumstance. While there are nearly infinite permutations of spatial and interpersonal contexts, some are far more predictably bounded than others.

Spaces that are highly aligned – physically, socially, and legally – have expectations that are far easier to navigate than those that are ambiguous or dynamic. A professional kitchen or a business conference room has a substantially more bounded social contract than a park or grocery store. These purpose-driven environments inherently establish social and often legal expectations around data collection and use.

In the absence of a prevailing social contract the operation of an AI device must either honor the least permissive expectations or risk transgression socially or legally. Notably the sphere of influence for an AI embodied in a mobile, portable device is inherently unbounded and thus subject to changing standards of acceptability. Managing data collection across the maximum range of socio-physical environments is an impossibly nuanced task.

Socio-physical spaces with clearly defined purposes and legal frameworks may readily host proactive, general AI’s.

Mobile, portable devices should host more limited AI’s that rely on human users to navigate social contexts.

BUILDING USEFUL AI DEVICES

The foregoing discussion of data qualities and their constraints thus leads me to a simple pair of conclusions about what sort of devices and systems can usefully host AIs in the foreseeable future:

Intelligent Agents should be situated.

Tools, Robots, and Assistants may be mobile.

Intelligent Agents should be Situated

A situated intelligent agent is a proactive, generally capable AI that is placed within a bounded and well understood environment in order to align the AI’s output to a set of users, legal frameworks, and social expectations.

Placing AI devices within a known physical environment allows the designer to ensure the system is technically sufficient with:

power, computation, and connectivity for continuous, robust operation,
data via a sensor suite operating at high resolution and rate.

Placing AI devices within a known social and legal environment allows the designer to ensure the system is socially acceptable:

it ensures the legal ownership and use of data is defined,
it allows humans using the space & system to be comfortably aware of its operation,
it aligns the AI and the humans to optimally cooperate on a limited set of tasks,
it assures humans that the AI has functional limits on its reach and awareness.

While the specific implementation of a situated AI system will derive from the space and tasks it is dedicated to, the benefits of such an arrangement seems comparatively clear.

Mobile AI’s should be Focused

On-person mobile agents, by comparison, face social and technical limitations on their utility. Tools, Robots, and Assistants all should expect to exhibit some reduced capacity in their agency, specificity, or both. Design of personal AI agents thus demands careful and nuanced consideration of expected use cases and features.

Development of mobile AI systems is following the nearly universal adoption of smartphone based ecosystems. Since personal, portable AI devices will follow phones in social precedent and in technical underpinnings we can expect forthcoming devices to augment existing smartphone capabilities while attempting to reshape their deficiencies.

digital task focus and facility

The recent AI mobile device launches from Humane and Rabbit offer keen manifestoes against the inefficiencies and distractions of mobile apps and screens.

The RabbitOS CEO has suggested “our smartphones have become the best device to kill time instead of saving it.” The Humane founders insist they “don’t do apps” and suggest that personal devices should “make you feel superhuman, not feel enslaved.” These observations are based on thoughtful concerns, and yet I think they willfully ignore real-world observations around how people want to use their mobile devices.

We use reach for our phones for two broad reasons:

to be entertained, and
to be productive.

The issue these AI device visions attempt to address is the utter commingling of entertainment and productivity in our digital realms. Our devices and online experiences have become quagmires of distractions. Metaphorically our devices keep trying to sh!t where they eat. Yet the truth is we want the ability to be entertained in the toilet stall just as much as we want the ability to powerfully wield our connectivity and compute power on the go. Users need to sandbox productivity and entertainment, mimicking the choice to watch TV after a day at work – though far more granularly. We need to be able to focus on our intended tasks with fewer distractions, less cognitive load, and less interface friction.

low/no-screen device lessons

Rabbit and Humane suggest that the most useful AI agents will be those that can take productive actions on your behalf. They offer to achieve the same (or better!) outcomes as humans tapping on screen based devices by relying on more contextual data, clever algorithms, and a lot of user trust.

Both visions rely on auditory natural language interactions first and foremost, issuing commands and trusting the AI to work invisibly on your behalf. On the UI front this high level task orientation and more “human” speech interaction is hoped to reduce cognitive load while avoiding distractions, thus improving efficiency and possibly efficacy.

The elimination of apps through natural language interactions can achieve these goals only for tasks where the results are consistently predictable and trustworthy. Yet first generation smart assistants like Amazon’s Alexa have shown there are serious limitations for “one-shot answers” to queries. Voice interactions are multitasking friendly and work well at a distance, but they are inefficient in delivering information with density and fidelity. Using auditory interactions instead of screens reduces the efficacy of an agent in delivering nuanced responses, detailed information, or taking actions that require user collaboration.

Rabbit strips out the visual clutter of the screen by primarily relying on a chatbot window. This reduces the visual clutter of apps and websites, returning to a simple text-first interface. Similarly there is a vocal minority of users who use ChatGPT as a web interface to filter the whole internet into a useful chat window, distilling a gamified internet into something approachably simple. Text based interactions seem promising in part because many of our most reliably productive and uncluttered media channels are text-first, including books and text messaging.

A screen can be incredibly information dense, a profoundly potent feature, yet our screens have lost utility after being thoroughly gamed. The Humane founders rightly opine that “the relationship that we have with our devices now is overpowered by the density of information that people have to contend with on a daily basis.” The phone usability issues Humane and Rabbit have identified are real, but their low/no screen AI-powered solutions are overcorrections that reduce the practical usability of mobile devices.

the digital reach of productivity

What these first AI devices get right is recognizing that our modern smartphones, via apps and web interactions, are astonishingly capable for a vast array of digitally mediated uses – but the digital media itself has become a hindrance. They recognize that users want to spend less time navigating the digital world and more productive time in the real world. (Conversely we also note that users want to spend more time being digitally entertained.)

Builders of personal AI devices must recognize that their productive reach is primarily bounded by the ability to harness existing digital systems that are useful for end users: fetching information, purchasing goods and services, interacting with other devices, and now generating content on demand. Even when AI interfaces successfully shield users from the cruft of apps or websites the actual results of a productive digital interaction will presumably be identical.

This is a freeing observation, recognizing that mobile AI devices can revel in their limitations to focusing on what could make a smartphone host better tools, robots, and assistants.

A Taxonomy Convivial Mobile AI’s

Tools: specific, reactive AIs

A smartphone based tool is functionally equivalent to an app. Services that require the user to initiate a particular interaction are the status quo, though perhaps AIs can offer new goods and services to users.

Robots: specific, proactive AIs
A smartphone embodied robot could automate a limited range of tasks on a conditional basis, particularly if the bot has access to the full range of a device’s capabilities on a system level. These could be explicitly selected or approved by users for specific circumstances, or implicitly learned by an AI over time. In each case a viable bot service relies on having a specific, finite set of usage permutations to attend to, it does not try to predict the full range of possibilities that a fully general “intelligent agent” does. Relying on user permission and behavior to act builds user trust in the system and ensures personally relevant utility.

As an example use case, a ‘bot enhanced afternoon workout could allow data that is currently disparate to usefully commingle: a weather app influences the precise scheduling of a pleasant workout between meetings, a birdsong app to mark the location of wildlife hotspots along a fitness app tracked run, when earbuds are connected a playlist is adjusted to the intensity of a watch-sensed heart rate, all while the phone to holds non-critical notifications during personal time.

Critically the proactive autonomy of a mobile AI should not rely on the constant collection of sensitive data such as photographs and sound. By focusing on specific applications the AI builder may readily restrict sensor data to socially acceptable sources. While there may be circumstances when a worn camera or always-on mic streaming data to a mobile AI offers some proactive utility, few compelling use cases have been identified that would justify the liabilities of wholesale data collection. The risk / reward ratio for always-watching cameras and always-on microphones is terrible.

Assistants: general, reactive AIs
A smartphone based assistant could streamline a wide variety of tasks in response to user prompting. These prompts could be given through any combination of talking, typing, or tapping initiations, launched in concert with the full array of sensors and computing power, local or connected, or at the AI’s disposal.

Importantly the assistant does not need to operate as a black box to be “magically” useful, but rather could allow users to collaborate at any step of the process by allowing users to go into deeper information on demand. Productivity interactions should start at an appropriate level for the task, offering a fidelity of feedback and input that is usefully elegant for users who need to evaluate and fine tune the results. A superlative AI assistant should offer users productivity interactions that are superhuman in speed, fidelity, accuracy, range, and awareness without eliminating the autonomy and agency of users themselves.

Bumbling Agents: general, proactive AIs
As thoroughly argued, building a proactive, general purpose AI into a mobile device willfully ignores social and technical limitations. Yet certainly some companies will try. Inevitably those devices will take autonomous action that is unwanted by their users and collect data that is inappropriate to gather. As with all technologies and gadgets the early adopters will gloss over these shortcomings without acknowledging that they are fundamentally unacceptable to the majority. Brands that wish to guard their reputations as builders of devices that serve humanity must tread carefully around unfettered autonomy.

ON BRANDS AND VALUES

The inherent limitations of AI agents will drive feature level decisions by device builders. As devices mature and proliferate a company’s technical choices will soon be interpreted as for their social implications. Companies will be seen as more autonomous or more subservient, more data brazen or more demure. This will lead to values-based divergence in market offerings from different brands, each touting the ethos of their AI systems.

We will soon move into a future where users will adopt products because they strongly align to a brand’s expressed value systems. Customers will demand a level of philosophical and social coherence that supersedes how a brand’s features currently align to their marketing campaigns. Consumers will choose products that digitally express values they personally identify with because inviting autonomous agents into their daily lives is an exceedingly intimate choice. We interact with our personal digital devices more than we do with any other human or object in our lives. Selecting an AI device will be like hiring a personal assistant and making a friend, tasks that people do not take lightly and are seldom coincidental.

Though today’s state of the art does not overtly demand a philosophical stance, the savvy AI device builder is well served to look ahead. The choices to create convivial technologies that truly honor and respect social boundaries through technical decisions should not be taken lightly. The initial trajectory for a brand’s AI offerings sets an important precedent both within the company and in the public eye.

This essay argues that the lasting brands of tomorrow will be those that understand that:

proactive, general purpose AI agents should be situated in spaces with bounded social constructs and clear utility for their users;
mobile, personal AI agents should have their utility limited to specific, on-demand uses that respect common-denominator social constructs around data collection.

FIN.

Full price vs Free

I have two price categories for my design work: full price or free.

Both prices get top quality work, I always do my personal best for my clients. Yet how the work gets done is quite different.

Full price work is always done on contract through my LLC. That contract that entitles you to all sorts of things we agree upon in advance: working deadlines, fast communication, influence over the work, a method to adjust the plan as the work evolves, and what IP you get to keep. As long as you pay your bills on time we keep happily working together as defined in the contract. You pay with money, you make the decisions.

Free work generally has no contract and is thus on my terms. I work at my pace, in my style, at my scope, and at my discretion. I retain copyrights and moral rights to assign as I see fit. I reserve the right to limit the scope of free work at my continuous discretion, as free has a natural tendency to drift towards unlimited and undervalued. I pay with time, I make the decisions.

In both cases I only accept work that I can honor with commitment and intention: if I have offered to do the work I will do it. But the method of execution between full price or free is wildly different and it's important for my clients and I to set expectations accordingly. Shared understanding facilitates mutual satisfaction. Let’s do good and be well.

Human : Machine I/O

I design physical products with digital capabilities. Accordingly I am constantly navigating the boundaries of perception for both human & machine. We build in our own image, and so we should understand the parallels between how people and devices operate in the classical robots paradigm of Sense / Plan / Act.

I am an engineer by formal training and a designer by experience and disposition. So I tend to start analytical and then move into the intuitive. Accordingly I set out to understand the robotic mantra of Sense Plan Act with by exhaustively exploring what that means for both our biological structures and engineered devices.

To wit, I started compiling a spreadsheet that covers every human and machine I/O that I could conceivably utilize in designing products and experiences. Every human sensor in the nervous system. Every category of miniaturizable digital sensor. Every means for a electronic or electromechanical device to do something perceptible.

Clearly this requires painting with broad strokes, but I strive for clarity and specificity where possible.

The work-in-progress list can be found in this Google Doc spreadsheet. Comments and suggestions for additions are welcome.

https://docs.google.com/spreadsheets/d/1oBe27BKbpvts7UNANxv9ZgIUA5QyFdx9Bppj7GQuYaw/

We have great responsibilities as shapers of usable technology, let us use these powers for good. Build well and be well.