When Nate Soares psychoanalyzes himself, he sounds less Freudian than Spockian. As a boy, he’d see people acting in ways he never would “unless I was acting maliciously,” the former Google software engineer, who now heads the non-profit Machine Intelligence Research Institute, reflected in a blog post last year. “I would automatically, on a gut level, assume that the other person must be malicious.” It’s a habit anyone who’s read or heard David Foster Wallace’s “This is Water” speech will recognize.
Later Soares realized this folly when his “models of other people” became “sufficiently diverse”—which isn’t to say they’re foolproof, he wrote in the same post. “I’m probably still prone to occasionally believing someone is malicious when they’re merely different than me, especially in cases where they act similarly to me most of the time (thereby fooling my gut-level person-modeler into modeling them too much like myself).” He suspected that this “failure mode” links up with the “typical mind” fallacy, and is “therefore,” he concluded, “difficult to beat in general.”
Beating biases is one of Soares’ core concerns—“I care a lot about having accurate beliefs,” he says—and one of them is the bias that artificial intelligence will just happen to be on our side, like a child born to loving parents. Soares disagrees: It’s up to us to make it so. This is the “alignment problem.” Aligning an AI’s behavior with our values and goals is no trivial task. In fact, it may be so challenging, and so consequential, that it defines our time, according to one of Soares’ employees, the decision theorist Eliezer Yudkowksy. Yudkowksy’s Twitter bio reads “Ours is the era of inadequate AI alignment theory. Any other facts about this era are relatively unimportant …” He recently convinced Neil deGrasse Tyson that keeping a super intelligent machine “in a box,” disconnected from the Internet, was a lousy safeguard for a misaligned AI and bemoaned Steven Pinker’s misconceptions about AI safety. Donors find the Institute’s mission worthwhile: In December 2017, 341 of them gifted it $2.5 million.
Over the last few months, I’ve been corresponding with Soares over email about why, among other things, a misaligned AI doesn’t need to be malicious to do us harm.