Conceptual Boundaries Workshop
For identifying, discussing, and strategizing about promising AI safety research directions pertaining to the boundaries that causally distance agents from their environment.
February 10–12, 2024 in Austin, TX.
For updates, see our substack:
What are agent boundaries?
A few examples:
A bacterium uses its membrane to protect its internal processes from external influences.
A nation maintains its sovereignty by defending its borders.
A human protects their mental integrity by selectively filtering the information that comes in and out of their mind.
…a natural abstraction for safety?
Agent boundaries seem to be a natural abstraction representing the safety and autonomy of agents.
A bacterium survives only if its membrane is preserved.
A nation maintains its sovereignty only if its borders aren’t invaded.
A human mind maintains mental integrity only if it can hold off informational manipulation.
Maybe the safety of agents could be largely formalized as the preservation of their membranes.
These boundaries can then be formalized via Markov blankets.
Boundaries are also cool because they show a way to respect agents without needing to talk about their preferences or utility functions. Andrew Critch has said the following about this idea:
my goal is to treat boundaries as more fundamental than preferences, rather than as merely a feature of them. In other words, I think boundaries are probably better able to carve reality at the joints than either preferences or utility functions, for the purpose of creating a good working relationship between humanity and AI technology («Boundaries» Sequence, Part 3b)
For instance, respecting the boundary of a bacterium would probably mean “preserving or not disrupting its membrane” (as opposed to knowing its preferences and satisfying them).
Protecting agents and infrastructure
By formalizing and preserving the important boundaries in the world, we could be in a better position to protect humanity from AI threats.
For example, critical computing infrastructure could be secured by creating strong boundaries around them. This can be enforced by cryptography and formal methods such that only the subprocesses that need to have read and/or write access to a particular resource (like memory) have the encryption keys to do so. Related: Object-capability model, Principle of least privilege, Evan Miyazono’s Atlas Computing, Davidad’s Open Agency Architecture.
And it may also be possible to do something similar with physical property rights.
Attendees
Seeking
Do you have experience with formal computer security, Active Inference, Embedded Agency, biological gap junctions, or other frameworks that distinguish agents from their environment?
Applications have now closed.
Note that we will most likely be running more boundaries workshops in mid 2024. To get notified, sign up for the mailing list at the bottom of the page.
Questions
How can boundaries help with safety?
What, formally, is a "boundary protocol" which describes the conditions under which exceptions can be made to the default prohibition on boundary violations?
Andrew Critch's current formalization of boundaries is fundamentally dependent on physical time. How can this be generalized to logical time?
What fields already have theories and implementations of the kind of boundaries we mean?
What empirical projects could help make progress on verifying and implementing boundaries-based safety approaches ASAP?
Related work
Active Inference, Markov blankets
Andrew Critch’s «Boundaries» Sequence
cell gap junctions; Michael Levin’s work on cell cooperation
Scott Garrabrant’s Cartesian Frames
Intended output
To identify promising research directions and empirical projects for formalizing boundaries and applying boundaries to safety.
For example, what would be needed to specify a formal language for describing boundaries-based ethics?
Logistics
Begins: February 10 at 7 PM local.
Ends: February 12 at 3 PM local.
Location: Austin, TX.
Conceptual Boundaries Workshop is financially supported by the Foresight Institute, Blake Borgeson, and LTFF.