This is Jessica. I don’t usually post multiple times a week, but turns out I have more to say on the topic of machine learning and human problems.
“Alignment” is the term used in the AI and ML communities to refer to the goal of aligning machine learning models with human values and preferences, so as to avoid risks ranging from the mundane to the catastrophic. It’s the topic of papers, workshops, talks, funding calls, etc.
There has been criticism of the nebulousness of what alignment is supposed to actually represent. Some of the critique of the ML conception of alignment comes from HCI research, the very interdisciplinary field that studies how people interact with technology and how to design human-computer interfaces. This pushback predates the “alignment” buzzword actually. I remember watching many in the HCI community bristle when in 2016 Michael Jordan wrote a blog post calling for the creation of a new “human-centric engineering discipline.” Seeing human-centered concerns get called out as a new frontier in AI/ML circles was enough to motivate some of the better resourced HCI researchers to create centers on Human-Centered AI or install themselves as team leaders in big tech companies, ensuring they wouldn’t be overlooked. Others have worked to make AI-related applications a bigger part of HCI research. Many are left to stew about wheels being reinvented, trying to be patient and issuing the occasional plea for everyone to recognize the overlapping goals.
My take is that HCI can help quite a bit with alignment, but that what it can offer is not what much of the ML research community wants or perceives themselves to need. It’s kind of like what a consulting statistician can offer to a data analysis versus what they are perceived to offer by those that recruit them. The real value of adding the statistician is often their role in helping you rethink your objective from the ground up. It’s not necessarily that they’re going to give you exactly the best tools to address some narrower problem you’ve convinced yourself needs to be solved. E.g., you’re convinced that if you just find the right causal inference technique you can confound a big messy dataset you’ve amassed and learn exactly how to improve some outcome X, but the pesky statistician comes in and spoils it by telling you, “No, if X is ultimately the goal, you’re going to need a different data collection procedure altogether.”
In the case of aligning ML, there are certainly human-oriented questions that arise within the current paradigm for aligning models. For example, questions of eliciting specific information from humans become important for deploying generative models. Reinforcement learning from human feedback (RLHF) is a standard method for fine-tuning a large pretrained model like GPT-4, where some group of annotators is recruited, often with no special experience required, and asked to select their preferred model output in a series of forced choice tasks, usually given some loosely defined criteria like “most helpful” or “least harmful”. Behavioral models for aggregating preferences across people like Bradley-Terry-Luce are used to learn a utility function. Human-oriented concerns include how to design the forced choice task and interface, how much information can reasonably be obtained from a single person, and how to crowdsource this efficiently. Beyond the common need to collect human annotations, other examples where human concerns arise in the current ML paradigm include questions like how to represent fairness ideals or how to evaluate post-hoc explanation techniques.
Could an HCI researcher be helpful for these questions? Sure, though I suspect that the most relevant work for some of these elicitation problems is likely to be found elsewhere, like psychophysics or decision science. Could the ML researcher figure this kind of stuff out without the HCI researcher? Probably. In many cases it may be more efficient for them to do it themselves, since HCI is a very large and interdisciplinary field. So I’m not surprised that ML researchers are often doing these things themselves, nor do I blame them.
On the other hand, I think the HCI pushback to ML alignment is valid when you consider the broader goal of creating predictive models that are well-aligned with human goals and values. If there’s a secret sauce that your average HCI researcher can bring, it’s the mindset of user-centered design, which makes serious attempts to understand the needs of the people being designed for. HCI research also demonstrates what it looks like to hold the conviction that human values are not monolithic, contributing knowledge on a variety of methods to try to get at what different groups want from technology. When taken to heart, I expect this kind of perspective suggests rethinking pretty much everything about human-facing ML models from the ground up.
Unfortunately though, I don’t really see much incentive for the average ML researcher interested in alignment to invest in the HCI way of doing things. All interdisciplinary collaborations tend to be hard, and this one seems likely to be particularly slow and messy. Meanwhile AI/ML research is moving at a faster pace than ever.
I also tend to believe that when someone is peering into a field, and believes that they can bring in some new perspective or methods that will be transformative, there’s an onus on that person to invest enough in understanding the field they hope to change to be able to demonstrate the value they want to bring. You can’t really expect people to listen if you haven’t taken the time to understand their concerns well enough to show them that you really could provide concrete suggestions. If the HCI researcher wants alignment to be done better or differently, maybe it’s time they temporally reinvent themselves as an ML researcher. Figuring out how to publish HCI-oriented papers at ML venues may not be easy, but it’s a step toward real impact.
I don’t mean this last part to sound dismissive, or like I’m trying to defend ML alignment. I think it’s just how things work. I’ve had multiple times in my career where I’ve looked at some other field and thought, I bet I could improve that. It’s how I’m feeling right now, actually. And every time I’ve been in this position, it seems clear to me that the only way to have that impact is to invest enough time in the new field to internalize how they think about it.