The AI chat bot ChatGPT can do a lot of things. It can respond to tweets, write science fiction, plan this reporter’s family Christmas, and it’s even slated to act as a lawyer in court. But can a robot provide safe and effective mental health support? A company called Koko decided to find out using the AI to help craft mental health support for about 4,000 of its users in October. Users—of Twitter, not Koko—were unhappy with the results and with the fact that the experiment took place at all.
“Frankly, this is going to be the future. We’re going to think we’re interacting with humans and not know whether there was an AI involved. How does that affect the human-to-human communication? I have my own mental health challenges, so I really want to see this done correctly,” Koko’s co-founder Rob Morris told Gizmodo in an interview.
Morris says the kerfuffle was all a misunderstanding.
“I shouldn’t have tried discussing it on Twitter,” he said.
Koko is a peer-to-peer mental health service that lets people ask for counsel and support from other users. In a brief experiment, the company let users to generate automatic responses using “Koko Bot”—powered by OpenAI’s GPT-3—which could then be edited, sent, or rejected. According to Morris, the 30,000 AI-assisted messages sent during the test received an overwhelmingly positive response, but the company shut the experiment down after a few days because it “felt kind of sterile.”
“When you’re interacting with GPT-3, you can start to pick up on some tells. It’s all really well written, but it’s sort of formulaic, and you can read it and recognize that it’s all purely a bot and there’s no human nuance added,” Morris told Gizmodo. “There’s something about authenticity that gets lost when you have this tool as a support tool to aid in your writing, particularly in this kind of context. On our platform, the messages just felt better in some way when I could sense they were more human-written.”
Morris posted a thread to Twitter about the test that implied users didn’t understand an AI was involved in their care. He tweeted that “once people learned the messages were co-created by a machine, it didn’t work.” The tweet caused an uproar on Twitter about the ethics of Koko’s research.
“Messages composed by AI (and supervised by humans) were rated significantly higher than those written by humans on their own,” Morris tweeted. “Response times went down 50%, to well under a minute.”
Morris said these words caused a misunderstanding: the “people” in this context were himself and his team, not unwitting users. Koko users knew the messages were co-written by a bot, and they weren’t chatting directly with the AI, he said.
“It was explained during the on-boarding process,” Morris said. When AI was involved, the responses included a disclaimer that the message was “written in collaboration with Koko Bot,” he added.
However, the experiment raises ethical questions, including doubts about how well Koko informed users, and the risks of testing an unproven technology in a live health care setting, even a peer-to-peer one.
In academic or medical contexts, it’s illegal to run scientific or medical experiments on human subjects without their informed consent, which includes providing test subjects with exhaustive detail about the potential harms and benefits of participating. The Food and Drug Administration requires doctors and scientists to run studies through an Institutional Review Board (IRB) meant to ensure safety before any tests begin.
But the explosion on online mental health services provided by private companies has created a legal and ethical gray area. At a private company providing mental health support outside of a formal medical setting, you can basically do whatever you want to your customers. Koko’s experiment didn’t need or receive IRB approval.
“From an ethical perspective, anytime you’re using technology outside of what could be considered a standard of care, you want to be extremely cautions and overly disclose what you’re doing,” said John Torous, MD, the director of the division of digital psychiatry at Beth Israel Deaconess Medical Center in Boston. “People seeking mental health support are in a vulnerable state, especially when they’re seeking emergency or peer services. It’s population we don’t want to skimp on protecting.”
Torous said that peer mental health support can be very effective when people go through appropriate training. Systems like Koko take a novel approach to mental health care that could have real benefits, but users don’t get that training, and these services are essentially untested, Torous said. Once AI gets involved, the problems are amplified even further.
“When you talk to ChatGPT, it tells you ‘please don’t use this for medical advice.’ It’s not tested for uses in health care, and it could clearly provide inappropriate or ineffective advice,” Torous said.
The norms and regulations surrounding academic research don’t just ensure safety. They also set standards for data sharing and communication, which allows experiments to build on each other, creating an ever growing body of knowledge. Torous said that in the digital mental health industry, these standards are often ignored. Failed experiments tend to go unpublished, and companies can be cagey about their research. It’s a shame, Torous said, because many of the interventions mental health app companies are running could be helpful.
Morris acknowledged that operating outside of the formal IRB experimental review process involves a tradeoff. “Whether this kind of work, outside of academia, should go through IRB processes is an important question and I shouldn’t have tried discussing it on Twitter,” Morris said. “This should be a broader discussion within the industry and one that we want to be a part of.”
The controversy is ironic, Morris said, because he said he took to Twitter in the first place because he wanted to be as transparent as possible. “We were really trying to be as forthcoming with the technology and disclose in the interest of helping people think more carefully about it,” he said.