Connect with us


This is the Stanford vaccine algorithm that left out frontline doctors



When resident physicians at Stanford Medical Center—many of whom work on the frontlines of the covid-19 pandemic—found out that only seven out of over 1,300 of them had been prioritized for the first 5,000 doses of the covid vaccine, they were shocked. Then, when they saw who else had made the list, including administrators and doctors seeing patients remotely from home, they were angry.

During a planned photo op to celebrate the first vaccinations taking place on Friday December 18, at least 100 residents showed up to protest. Hospital leadership apologized for not prioritizing them, and blamed the errors on “a very complex algorithm.” 

“Our algorithm, that the ethicists, infectious disease experts worked on for weeks… clearly didn’t work right,” Tim Morrison, the director of the ambulatory care team, told residents at the event in a video posted online.

Many saw that as an excuse, especially since hospital leadership had been made aware of the problem on Tuesday—when only five residents made the list—and responded not by fixing the algorithm, but by adding two more resident names for a total of seven. 

“One of the core attractions of algorithms is that they allow the powerful to blame a black box for politically unattractive outcomes for which they would otherwise be responsible,” Roger McNamee, a prominent Silicon Valley insider turned critic, wrote on Twitter. “But *people* decided who would get the vaccine,” tweeted Veena Dubal, a professor of law at the University of California, Hastings, who researches technology and society. “The algorithm just carried out their will.” 

But what exactly was Stanford’s “will”? We took a look at the algorithm to find out what it was meant to do. 

How the algorithm works

The slide describing the algorithm came from residents who had received it from their department chair. It is not a complex machine-learning algorithm (which are often referred to as “black boxes”) but a rules-based formula for calculating who would get the vaccine first at Stanford. It considers three categories: “employee-based variables,” which have to do with age; “job-based variables”; and guidelines from the California Department of Public Health. For each category, staff received a certain number of points, with a total possible score of 3.48. Presumably, the higher someone’s score, the higher their priority in line. (Stanford Medical Center did not respond to multiple requests for comment over the weekend on the algorithm.) 

The employee variables increase a person’s score linearly with age, then adds extra points to those over 65 or under 25. This gives priority to the oldest and youngest staff, which disadvantages residents and other frontline workers who are typically in the middle of the age range.

Job variables contribute the most to the overall score. The algorithm counts the prevalence of covid-19 among employees’ job role and department in two different ways, but the difference between them is not entirely clear. Neither the residents, nor two unaffiliated experts that we asked to review the algorithm, understood what these criteria meant, and Stanford Medical Center did not respond to a request for comment. They also consider the proportion of tests taken by job role as a percentage of the medical center’s total number of tests collected. 

What these factors do not take into account is exposure to patients with covid-19, say residents. That means the algorithm did not distinguish between those who had caught covid from patients versus those who got it from community spread—including employees working remotely. And, as first reported by ProPublica, residents were told that because they rotate between departments rather than maintain a single assignment, they lost out on points associated with the departments where they worked. 

The algorithm’s third category refers to the California Department of Public Health’s vaccine allocation guidelines. These focus on exposure risk as the single highest factor for vaccine prioritization. The guidelines are intended primarily for county and local governments to decide how to prioritize the vaccine, rather than how to prioritize between a hospital’s departments: but they do specifically include residents, along with the departments where they work, in the highest priority tier. 

It may be that the “CDPH range” factor gives residents a higher score, but that this is still not enough to counteract the higher points given to the other criteria.

“Why did they do it that way?” 

Stanford tried to factor in a lot more variables than other medical facilities, but Jeffrey Kahn, the director of the Johns Hopkins Berkman Institute of Bioethics says the approach was overcomplicated. “The more there are different weights for different things, it then becomes harder to understand, ‘why did they do it that way?’”

Kahn sat on Johns Hopkins’ 20-member committee on vaccine allocation, and says his university allocated vaccines based simply on job and risk of exposure to covid-19.

He says that decision was based on discussions that purposefully included different perspectives—including those of residents—and in coordination with other hospitals in Maryland. Elsewhere, the University of California San Francisco’s plan is based on a similar assessment of risk of exposure to the virus. Mass General Brigham in Boston categorizes employees into four groups based on department and job location, according to an internal email reviewed by MIT Technology Review.

“There’s so little trust around so much related to the pandemic, we cannot squander it.”

“It’s really important [for] any approach like this to be transparent and public…and not something really hard to figure out,” Kahn says. “There’s so little trust around so much related to the pandemic, we cannot squander it.” 

Algorithms are commonly used in healthcare to rank patients by risk level in an effort to distribute care and resources more equitably. But the more variables used, the harder it is to assess whether the calculations might be flawed.

For example, in 2019, a study published in Science showed that 10 widely-used algorithms for distributing care in the US ended up favoring white patients over Black ones. The problem, it turned out, was that the algorithms’ designers assumed that patients who spent more on health care were more sickly and needed more help. In reality, higher spenders are also richer, and more likely to be white. As a result, the algorithm allocated less care to Black patients with the same medical conditions as white ones.

Irene Chen, an MIT doctoral candidate who studies the use of fair algorithms in healthcare, suspects this is what happened at Stanford: the formula’s designers chose variables that they believed would serve as good proxies for a given staffer’s level of covid risk. But they didn’t verify that these proxies led to sensible outcomes, or respond in a meaningful way to the community’s input when the vaccine plan came to light on Tuesday last week. “It’s not a bad thing that people had thoughts about it afterward,” says Chen. “It’s that there wasn’t a mechanism to fix it.”

A canary in the coal mine?

After the protests, Stanford issued a formal apology, saying it would revise its distribution plan. 

Hospital representatives did not respond to questions about who they would include in new planning processes, or whether the algorithm would continue to be used. An internal email summarizing the medical school’s response, shared with MIT Technology Review, states that neither program heads, department chairs, attending physicians, nor nursing staff were involved in the original algorithm design. Now, however, some faculty are pushing to have a bigger role, eliminating the algorithms’ results completely, and instead giving division chiefs and chairs the authority to make decisions for their own teams. 

Other department chairs have encouraged residents to get vaccinated first. Some have even asked faculty to bring residents with them when they get vaccinated, or delay their shots so that others could go first.

Some residents are bypassing the university healthcare system entirely. Nurial Moghavem, a neurology resident who was the first to publicize the problems at Stanford, tweeted on Friday afternoon that he had finally received his vaccine—not at Stanford, but at a public county hospital in Santa Clara County. 
“I got vaccinated today to protect myself, my family, and my patients,” he tweeted. “But I only had the opportunity because my public county hospital believes that residents are critical front-line providers. Grateful.”

Continue Reading


What is an “algorithm”? It depends whom you ask



Describing a decision-making system as an “algorithm” is often a way to deflect accountability for human decisions. For many, the term implies a set of rules based objectively on empirical evidence or data. It also suggests a system that is highly complex—perhaps so complex that a human would struggle to understand its inner workings or anticipate its behavior when deployed.

But is this characterization accurate? Not always.

For example, in late December Stanford Medical Center’s misallocation of covid-19 vaccines was blamed on a distribution “algorithm” that favored high-ranking administrators over frontline doctors. The hospital claimed to have consulted with ethicists to design its “very complex algorithm,” which a representative said “clearly didn’t work right,” as MIT Technology Review reported at the time. While many people interpreted the use of the term to mean that AI or machine learning was involved, the system was in fact a medical algorithm, which is functionally different. It was more akin to a very simple formula or decision tree designed by a human committee.

This disconnect highlights a growing issue. As predictive models proliferate, the public becomes more wary of their use in making critical decisions. But as policymakers begin to develop standards for assessing and auditing algorithms, they must first define the class of decision-making or decision support tools to which their policies will apply. Leaving the term “algorithm” open to interpretation could place some of the models with the biggest impact beyond the reach of policies designed to ensure that such systems don’t hurt people.

How to ID an algorithm

So is Stanford’s “algorithm” an algorithm? That depends how you define the term. While there’s no universally accepted definition, a common one comes from a 1971 textbook written by computer scientist Harold Stone, who states: “An algorithm is a set of rules that precisely define a sequence of operations.” This definition encompasses everything from recipes to complex neural networks: an audit policy based on it would be laughably broad.

In statistics and machine learning, we usually think of the algorithm as the set of instructions a computer executes to learn from data. In these fields, the resulting structured information is typically called a model. The information the computer learns from the data via the algorithm may look like “weights” by which to multiply each input factor, or it may be much more complicated. The complexity of the algorithm itself may also vary. And the impacts of these algorithms ultimately depend on the data to which they are applied and the context in which the resulting model is deployed. The same algorithm could have a net positive impact when applied in one context and a very different effect when applied in another.

In other domains, what’s described above as a model is itself called an algorithm. Though that’s confusing, under the broadest definition it is also accurate: models are rules (learned by the computer’s training algorithm instead of stated directly by humans) that define a sequence of operations. For example, last year in the UK, the media described the failure of an “algorithm” to assign fair scores to students who couldn’t sit for their exams because of covid-19. Surely, what these reports were discussing was the model—the set of instructions that translated inputs (a student’s past performance or a teacher’s evaluation) into outputs (a score).

What seems to have happened at Stanford is that humans—including ethicists—sat down and determined what series of operations the system should use to determine, on the basis of inputs such as an employee’s age and department, whether that person should be among the first to get a vaccine. From what we know, this sequence wasn’t based on an estimation procedure that optimized for some quantitative target. It was a set of normative decisions about how vaccines should be prioritized, formalized in the language of an algorithm. This approach qualifies as an algorithm in medical terminology and under the broad definition, even though the only intelligence involved was that of humans.

Focus on impact, not input

Lawmakers are also weighing in on what an algorithm is. Introduced in the US Congress in 2019, HR2291, or the Algorithmic Accountability Act, uses the term “automated decisionmaking system” and defines it as “a computational process, including one derived from machine learning, statistics, or other data processing or artificial intelligence techniques, that makes a decision or facilitates human decision making, that impacts consumers.”

Similarly, New York City is considering Int 1894, a law that would introduce mandatory audits of “automated employment decision tools,” defined as “any system whose function is governed by statistical theory, or systems whose parameters are defined by such systems.” Notably, both bills mandate audits but provide only high-level guidelines on what an audit is.

As decision-makers in both government and industry create standards for algorithmic audits, disagreements about what counts as an algorithm are likely. Rather than trying to agree on a common definition of “algorithm” or a particular universal auditing technique, we suggest evaluating automated systems primarily based on their impact. By focusing on outcome rather than input, we avoid needless debates over technical complexity. What matters is the potential for harm, regardless of whether we’re discussing an algebraic formula or a deep neural network.

Impact is a critical assessment factor in other fields. It’s built into the classic DREAD framework in cybersecurity, which was first popularized by Microsoft in the early 2000s and is still used at some corporations. The “A” in DREAD asks threat assessors to quantify “affected users” by asking how many people would suffer the impact of an identified vulnerability. Impact assessments are also common in human rights and sustainability analyses, and we’ve seen some early developers of AI impact assessments create similar rubrics. For example, Canada’s Algorithmic Impact Assessment provides a score based on qualitative questions such as “Are clients in this line of business particularly vulnerable? (yes or no).”

What matters is the potential for harm, regardless of whether we’re discussing an algebraic formula or a deep neural network.

There are certainly difficulties to introducing a loosely defined term such as “impact” into any assessment. The DREAD framework was later supplemented or replaced by STRIDE, in part because of challenges with reconciling different beliefs about what threat modeling entails. Microsoft stopped using DREAD in 2008.

In the AI field, conferences and journals have already introduced impact statements with varying degrees of success and controversy. It’s far from foolproof: impact assessments that are purely formulaic can easily be gamed, while an overly vague definition can lead to arbitrary or impossibly lengthy assessments.

Still, it’s an important step forward. The term “algorithm,” however defined, shouldn’t be a shield to absolve the humans who designed and deployed any system of responsibility for the consequences of its use. This is why the public is increasingly demanding algorithmic accountability—and the concept of impact offers a useful common ground for different groups working to meet that demand.

Kristian Lum is an assistant research professor in the Computer and Information Science Department at the University of Pennsylvania.

Rumman Chowdhury is the director of the Machine Ethics, Transparency, and Accountability (META) team at Twitter. She was previously the CEO and founder of Parity, an algorithmic audit platform, and global lead for responsible AI at Accenture.

Continue Reading


MyHeritage now lets you animate old family photos using deepfakery



AI-enabled synthetic media is being used as a tool for manipulating real emotions and capturing user data by genealogy service, MyHeritage, which has just launched a new feature — called ‘deep nostalgia‘ — that lets users upload a photo of a person (or several people) to see individual faces animated by algorithm.

The Black Mirror-style pull of seeing long lost relatives — or famous people from another era — brought to a synthetic approximation of life, eyes swivelling, faces tilting as if they’re wondering why they’re stuck inside this useless digital photo frame, has led to an inexorable stream of social shares since it was unveiled yesterday at a family history conference… 

MyHeritage’s AI-powered viral marketing playbook with this deepfakery isn’t a complicated one: They’re going straight for tugging on your heart strings to grab data which can be used to drive sign ups for their other (paid) services. (Selling DNA tests is their main business.)

It’s free to animate a photo using the ‘deep nostalgia’ tech on MyHeritage’s site but you don’t get to see the result until you hand over at least an email (along with the photos you want animating, ofc) — and agree to its T&Cs and privacy policy. Both of which have attracted a number of concerns, over the years.

Last year, for example, the Norwegian Consumer Council reported MyHeritage to the national consumer protection and data authorities after a legal assessment of the T&Cs found the contract it asks customers to sign to be “incomprehensible”.

In 2018 MyHeritage also suffered a major data breach — and data from that breach was later found for sale on the dark web, among a wider cache of hacked account info pertaining to several other services.

The company — which, as we reported earlier this week, is being acquired by a US private equity firm for ~$600M — is doubtless relying on the deep pull of nostalgia to smooth over any individual misgivings about handing over data and agreeing to its terms.

The face animation technology itself is impressive enough — if you set aside the ethics of encouraging people to drag their long lost relatives into the uncanny valley to help MyHeritage cross-sell DNA testing (with all the massive privacy considerations around putting that kind of data in the hands of a commercial entity).

Looking at the inquisitive face of my great grandmother I do have to wonder what she would have made of all this?

The facial animation feature is powered by Israeli company D-ID, a TechCrunch Disrupt battlefield alum — which started out building tech to digital de-identify faces with an eye on protecting image and video from being identifiable by facial recognition algorithms.

It released a demo video of the photo-animating technology last year. The tech uses a driver video to animate the photo — mapping the facial features of the photo onto that base driver to create a ‘live portrait’, as D-ID calls it.

“The Live Portrait solution brings still photos to life. The photo is mapped and then animated by a driver video, causing the subject to move its head and facial features, mimicking the motions of the driver video,” D-ID said in a press release. “This technology can be implemented by historical organizations, museums, and educational programs to animate well-known figures.”

It’s offering live portraits as part of a wider ‘AI Face’ platform which will offer third parties access to other deep learning, computer vision and image processing technologies. D-ID bills the platform as a ‘one-stop shop’ for syntheized video creation.

Other tools include a ‘face anonymization’ feature which replaces one person’s face on video with another’s (such as for documentary film makers to protect a whistleblower’s identity); and a ‘talking heads’ feature that can be used for lip syncing or to replace the need to pay actors to appear in content such as marketing videos as it can turn an audio track into a video of a person appearing to speak those words.

The age of synthesized media is going to be a weird one, that’s for sure.


Continue Reading


An AI is training counselors to deal with teens in crisis



Counselors volunteering at the Trevor Project need to be prepared for their first conversation with an LGBTQ teen who may be thinking about suicide. So first, they practice. One of the ways they do it is by talking to fictional personas like “Riley,” a 16-year-old from North Carolina who is feeling a bit down and depressed. With a team member playing Riley’s part, trainees can drill into what’s happening: they can uncover that the teen is anxious about coming out to family, recently told friends and it didn’t go well, and has experienced suicidal thoughts before, if not at the moment.

Now, though, Riley isn’t being played by a Trevor Project employee but is instead being powered by AI.

Just like the original persona, this version of Riley—trained on thousands of past transcripts of role-plays between counselors and the organization’s staff—still needs to be coaxed a bit to open up, laying out a situation that can test what trainees have learned about the best ways to help LGBTQ teens. 

Counselors aren’t supposed to pressure Riley to come out. The goal, instead, is to validate Riley’s feelings and, if needed, help develop a plan for staying safe. 

Crisis hotlines and chat services make them a fundamental promise: reach out, and we’ll connect you with a real human who can help. But the need can outpace the capacity of even the most successful services. The Trevor Project believes that 1.8 million LGBTQ youth in America seriously consider suicide each year. The existing 600 counselors for its chat-based services can’t handle that need. That’s why the group—like an increasing number of mental health organizations—turned to AI-powered tools to help meet demand. It’s a development that makes a lot of sense, while simultaneously raising questions about how well current AI technology can perform in situations where the lives of vulnerable people are at stake. 

Taking risks—and assessing them

The Trevor Project believes it understands this balance—and stresses what Riley doesn’t do. 

“We didn’t set out to and are not setting out to design an AI system that will take the place of a counselor, or that will directly interact with a person who might be in crisis,” says Dan Fichter, the organization’s head of AI and engineering. This human connection is important in all mental health services, but it might be especially important for the people the Trevor Project serves. According to the organization’s own research in 2019, LGBTQ youth with at least one accepting adult in their life were 40% less likely to report a suicide attempt in the previous year. 

The AI-powered training role-play, called the crisis contact simulator and supported by money and engineering help from Google, is the second project the organization has developed this way: it also uses a machine-learning algorithm to help determine who’s at highest risk of danger. (It trialed several other approaches, including many that didn’t use AI, but the algorithm simply gave the most accurate predictions for who was experiencing the most urgent need.)

AI-powered risk assessment isn’t new to suicide prevention services: the Department of Veterans Affairs also uses machine learning to identify at-risk veterans in its clinical practices, as the New York Times reported late last year. 

Opinions vary on the usefulness, accuracy, and risk of using AI in this way. In specific environments, AI can be more accurate than humans in assessing people’s suicide risk, argues Thomas Joiner, a psychology professor at Florida State University who studies suicidal behavior. In the real world, with more variables, AI seems to perform about as well as humans. What it can do, however, is assess more people at a faster rate. 

Thus, it’s best used to help human counselors, not replace them. The Trevor Project still relies on humans to perform full risk assessments on young people who use its services. And after counselors finish their role-plays with Riley, those transcripts are reviewed by a human. 

How the system works

The crisis contact simulator was developed because doing role-plays takes up a lot of staff time and is limited to normal working hours, even though a majority of counselors plan on volunteering during night and weekend shifts. But even if the aim was to train more counselors faster, and better accommodate volunteer schedules, efficiency wasn’t the only ambition. The developers still wanted the role-play to feel natural, and for the chatbot to nimbly adapt to a volunteers’ mistakes. Natural-language-processing algorithms, which had recently gotten really good at mimicking human conversations, seemed like a good fit for the challenge. After testing two options, the Trevor Project settled on OpenAI’s GPT-2 algorithm.

The chatbot uses GPT-2 for its baseline conversational abilities. That model is trained on 45 million pages from the web, which teaches it the basic structure and grammar of the English language. The Trevor Project then trained it further on all the transcripts of previous Riley role-play conversations, which gave the bot the materials it needed to mimic the persona.

Throughout the development process, the team was surprised by how well the chatbot performed. There is no database storing details of Riley’s bio, yet the chatbot stayed consistent because every transcript reflects the same storyline.

But there are also trade-offs to using AI, especially in sensitive contexts with vulnerable communities. GPT-2, and other natural-language algorithms like it, are known to embed deeply racist, sexist, and homophobic ideas. More than one chatbot has been led disastrously astray this way, the most recent being a South Korean chatbot called Lee Luda that had the persona of a 20-year-old university student. After quickly gaining popularity and interacting with more and more users, it began using slurs to describe the queer and disabled communities.

The Trevor Project is aware of this and designed ways to limit the potential for trouble. While Lee Luda was meant to converse with users about anything, Riley is very narrowly focused. Volunteers won’t deviate too far from the conversations it has been trained on, which minimizes the chances of unpredictable behavior.

This also makes it easier to comprehensively test the chatbot, which the Trevor Project says it is doing. “These use cases that are highly specialized and well-defined, and designed inclusively, don’t pose a very high risk,” says Nenad Tomasev, a researcher at DeepMind.

Human to human

This isn’t the first time the mental health field has tried to tap into AI’s potential to provide inclusive, ethical assistance without hurting the people it’s designed to help. Researchers have developed promising ways of detecting depression from a combination of visual and auditory signals. Therapy “bots,” while not equivalent to a human professional, are being pitched as alternatives for those who can’t access a therapist or are uncomfortable  confiding in a person. 

Each of these developments, and others like it, require thinking about how much agency AI tools should have when it comes to treating vulnerable people. And the consensus seems to be that at this point the technology isn’t really suited to replacing human help. 

Still, Joiner, the psychology professor, says this could change over time. While replacing human counselors with AI copies is currently a bad idea, “that doesn’t mean that it’s a constraint that’s permanent,” he says. People, “have artificial friendships and relationships” with AI services already. As long as people aren’t being tricked into thinking they are having a discussion with a human when they are talking to an AI, he says, it could be a possibility down the line. 

In the meantime, Riley will never face the youths who actually text in to the Trevor Project: it will only ever serve as a training tool for volunteers. “The human-to-human connection between our counselors and the people who reach out to us is essential to everything that we do,” says Kendra Gaunt, the group’s data and AI product lead. “I think that makes us really unique, and something that I don’t think any of us want to replace or change.”

Continue Reading