OATML Talk

The Role of Structure in Building Controllable AI

Yihong Chen

May 2025

Self-Introduction

Education:
UCL (Ph.D., CS)
Tsinghua (B.Eng., EE)

Experience:
Meta FAIR
Microsoft Research

Research:
Language Models
Knowledge Graphs
Continual Learning

Goal:
Robust, Steerable, Controllable AI

Research Theme

Structure

Explore how AI capture and acquire real world regularities

knowledge graph
large language model
neural-symbolic approaches

Destructure

Explore how AI break structures for adaptability and controllability

adapt to new inputs
update outdated information
remove inappropriate bias

Structure

Applied Systems

Pretraining on graphs (AKBC 2021) and texts (NeurIPS 2023)

Theoretical Contribution

↔

Unifying factorization models and GNNs (NeurIPS 2022)

Unify FM and GNN

\[ \begin{aligned} &\textbf{Theorem (Message Passing in FMs):} \\ &\text{The gradient descent operator } \text{GD} \text{ on the node embeddings of a DistMult model} \\ &\text{with the maximum likelihood objective and a multi-relational graph } \mathcal{T} \text{ over entities } \mathcal{E} \\ &\text{induces a message-passing operator whose composing functions are:} \end{aligned} \]

\[ q_{\mathrm{M}}(\phi[v], r, \phi[w]) = \begin{cases} \phi[w] \odot g(r) & \text{if } (r,w) \in \mathcal{N}_{+}^1[v], \\ (1 - P_\theta (v|w, r)) \phi[w] \odot g(r) & \text{if } (r, w) \in \mathcal{N}_-^1[v] \end{cases} \]

\[ q_{\mathrm{A}}(\{m[v, r, w] : (r,w) \in \mathcal{N}^1[v]\}) = \sum_{(r,w) \in \mathcal{N}^1[v]} m[v,r,w] \]

\[ q_{\mathrm{U}}(\phi[v], z[v]) = \phi[v] + \alpha z[v] - \beta n[v] \]

\[ n[v]= \frac{|\mathcal{N}_{+}^{1}[v]|}{|\mathcal{T}|} \mathbb{E}_{ P_{\mathcal{N}_+^{1}[v]} } \mathbb{E}_{ u \sim P_{\theta}(\cdot|v, r)} g(r) \odot \phi[u] + \frac{|\mathcal{T}^{-v}|}{|\mathcal{T}|} \mathbb{E}_{ P_{\mathcal{T}^{-v}} } P_\theta(v|s, r) g(r) \odot \phi[s] \]

where \( \mathcal{T}^{-v} = \{(s, r, o) \in \mathcal{T} : s \neq v \land o \neq v \} \), and \( P_{\mathcal{N}^{1}_+[v]} \), \( P_{\mathcal{T}^{-v}} \) are empirical probability distributions.

Not all knowledge are useful

Challenges of Controlling Knowledge in LLMs

Destructure & Restructure

LLMs are excellent at structuring knowledge into neural weights, but poor at dismantling it. Unlike symbolic systems, they lack clearly addressable knowledge units.

Passive forgetting for removing information post-training

Pistol: Benchmarking Structural Unlearning for LLMs (2024); unlearning difficulty increases as data inter-connectivity grows

Active forgetting for promoting model adaptability