Connect with us


Auditors are testing hiring algorithms for bias, but find there’s no easy fix



I’m at home playing a video game on my computer. My job is to pump up one balloon at a time and earn as much money as possible. Every time I click “Pump,” the balloon expands and I receive five virtual cents. But if the balloon pops before I press “Collect,” all my digital earnings disappear.

After filling 39 balloons, I’ve earned $14.40. A message appears on the screen: “You stick to a consistent approach in high-risk situations. Trait measured: Risk.”

This game is one of a series made by a company called Pymetrics, which many large US firms hire to screen job applicants. If you apply to McDonald’s, Boston Consulting Group, Kraft Heinz, or Colgate-Palmolive, you might be asked to play Pymetrics’s games.

While I play, an artificial-intelligence system measures traits including generosity, fairness, and attention. If I were actually applying for a position, the system would compare my scores with those of employees already working in that job. If my personality profile reflected the traits most specific to people who are successful in the role, I’d advance to the next hiring stage.

More and more companies are using AI-based hiring tools like these to manage the flood of applications they receive—especially now that there are roughly twice as many jobless workers in the US as before the pandemic. A survey of over 7,300 human-resources managers worldwide by Mercer, an asset management firm, found that the proportion who said their department uses predictive analytics jumped from 10% in 2016 to 39% in 2020.

Stills of Pymetrics’s core product, a suite of 12 AI-based games that the company says can discern a job applicant’s social, cognitive, and emotional attributes.

As with other AI applications, though, researchers have found that some hiring tools produce biased results—inadvertently favoring men or people from certain socioeconomic backgrounds, for instance. Many are now advocating for greater transparency and more regulation. One solution in particular is proposed again and again: AI audits.

Last year, Pymetrics paid a team of computer scientists from Northeastern University to audit its hiring algorithm. It was one of the first times such a company had requested a third-party audit of its own tool. CEO Frida Polli told me she thought the experience could be a model for compliance with a proposed law requiring such audits for companies in New York City, where Pymetrics is based.

Pymetrics markets its software as “entirely bias free.”

“What Pymetrics is doing, which is bringing in a neutral third party to audit, is a really good direction in which to be moving,” says Pauline Kim, a law professor at Washington University in St. Louis, who has expertise in employment law and artificial intelligence. “If they can push the industry to be more transparent, that’s a really positive step forward.”

For all the attention that AI audits have received, though, their ability to actually detect and protect against bias remains unproven. The term “AI audit” can mean many different things, which makes it hard to trust the results of audits in general. The most rigorous audits can still be limited in scope. And even with unfettered access to the innards of an algorithm, it can be surprisingly tough to say with certainty whether it treats applicants fairly. At best, audits give an incomplete picture, and at worst, they could help companies hide problematic or controversial practices behind an auditor’s stamp of approval.

Inside an AI audit

Many kinds of AI hiring tools are already in use today. They include software that analyzes a candidate’s facial expressions, tone, and language during video interviews as well as programs that scan résumés, predict personality, or investigate an applicant’s social media activity.

Regardless of what kind of tool they’re selling, AI hiring vendors generally promise that these technologies will find better-qualified and more diverse candidates at lower cost and in less time than traditional HR departments. However, there’s very little evidence that they do, and in any case that’s not what the AI audit of Pymetrics’s algorithm tested for. Instead, it aimed to determine whether a particular hiring tool grossly discriminates against candidates on the basis of race or gender.

Christo Wilson at Northeastern had scrutinized algorithms before, including those that drive Uber’s surge pricing and Google’s search engine. But until Pymetrics called, he had never worked directly with a company he was investigating.

Wilson’s team, which included his colleague Alan Mislove and two graduate students, relied on data from Pymetrics and had access to the company’s data scientists. The auditors were editorially independent but agreed to notify Pymetrics of any negative findings before publication. The company paid Northeastern $104,465 via a grant, including $64,813 that went toward salaries for Wilson and his team.

Pymetrics’s core product is a suite of 12 games that it says are mostly based on cognitive science experiments. The games aren’t meant to be won or lost; they’re designed to discern an applicant’s cognitive, social, and emotional attributes, including risk tolerance and learning ability. Pymetrics markets its software as “entirely bias free.” Pymetrics and Wilson decided that the auditors would focus narrowly on one specific question: Are the company’s models fair?

They based the definition of fairness on what’s colloquially known as the four-fifths rule, which has become an informal hiring standard in the United States. The Equal Employment Opportunity Commission (EEOC) released guidelines in 1978 stating that hiring procedures should select roughly the same proportion of men and women, and of people from different racial groups. Under the four-fifths rule, Kim explains, “if men were passing 100% of the time to the next step in the hiring process, women need to pass at least 80% of the time.”

If a company’s hiring tools violate the four-fifths rule, the EEOC might take a closer look at its practices. “For an employer, it’s not a bad check,” Kim says. “If employers make sure these tools are not grossly discriminatory, in all likelihood they will not draw the attention of federal regulators.”

To figure out whether Pymetrics’s software cleared this bar, the Northeastern team first had to try to understand how the tool works.

When a new client signs up with Pymetrics, it must select at least 50 employees who have been successful in the role it wants to fill. These employees play Pymetrics’s games to generate training data. Next, Pymetrics’s system compares the data from those 50 employees with game data from more than 10,000 people randomly selected from over two million. The system then builds a model that identifies and ranks the skills most specific to the client’s successful employees.

To check for bias, Pymetrics runs this model against another data set of about 12,000 people (randomly selected from over 500,000) who have not only played the games but also disclosed their demographics in a survey. The idea is to determine whether the model would pass the four-fifths test if it evaluated these 12,000 people.

If the system detects any bias, it builds and tests more models until it finds one that both predicts success and produces roughly the same passing rates for men and women and for members of all racial groups. In theory, then, even if most of a client’s successful employees are white men, Pymetrics can correct for bias by comparing the game data from those men with data from women and people from other racial groups. What it’s looking for are data points predicting traits that don’t correlate with race or gender but do distinguish successful employees.

Christo Wilson
Christo Wilson of Northeastern University

Wilson and his team of auditors wanted to figure out whether Pymetrics’s anti-bias mechanism does in fact prevent bias and whether it can be fooled. To do that, they basically tried to game the system by, for example, duplicating game data from the same white man many times and trying to use it to build a model. The outcome was always the same: “The way their code is sort of laid out and the way the data scientists use the tool, there was no obvious way to trick them essentially into producing something that was biased and get that cleared,” says Wilson.

Last fall, the auditors shared their findings with the company: Pymetrics’s system satisfies the four-fifths rule. The Northeastern team recently published the study of the algorithm online and will present a report on the work in March at the algorithmic accountability conference FAccT.

“The big takeaway is that Pymetrics is actually doing a really good job,” says Wilson.

An imperfect solution

But though Pymetrics’s software meets the four-fifths rule, the audit didn’t prove that the tool is free of any bias whatsoever, nor that it actually picks the most qualified candidates for any job.

“It effectively felt like the question being asked was more ‘Is Pymetrics doing what they say they do?’ as opposed to ‘Are they doing the correct or right thing?’” says Manish Raghavan, a PhD student in computer science at Cornell University, who has published extensively on artificial intelligence and hiring.

“It effectively felt like the question being asked was more ‘Is Pymetrics doing what they say they do?’ as opposed to ‘Are they doing the correct or right thing?’”

For example, the four-fifths rule only requires people from different genders and racial groups to pass to the next round of the hiring process at roughly the same rates. An AI hiring tool could satisfy that requirement and still be wildly inconsistent at predicting how well people from different groups actually succeed in the job once they’re hired. And if a tool predicts success more accurately for men than women, for example, that would mean it isn’t actually identifying the best qualified women, so the women who are hired “may not be as successful on the job,” says Kim.

Another issue that neither the four-fifths rule nor Pymetrics’s audit addresses is intersectionality. The rule compares men with women and one racial group with another to see if they pass at the same rates, but it doesn’t compare, say, white men with Asian men or Black women. “You could have something that satisfied the four-fifths rule [for] men versus women, Blacks versus whites, but it might disguise a bias against Black women,” Kim says.

Pymetrics is not the only company having its AI audited. HireVue, another large vendor of AI hiring software, had a company called O’Neil Risk Consulting and Algorithmic Auditing (ORCAA) evaluate one of its algorithms. That firm is owned by Cathy O’Neil, a data scientist and the author of Weapons of Math Destruction, one of the seminal popular books on AI bias, who has advocated for AI audits for years.

Weapon s of Math Destruction

ORCAA and HireVue focused their audit on one product: HireVue’s hiring assessments, which many companies use to evaluate recent college graduates. In this case, ORCAA didn’t evaluate the technical design of the tool itself. Instead, the company interviewed stakeholders (including a job applicant, an AI ethicist, and several nonprofits) about potential problems with the tools and gave HireVue recommendations for improving them. The final report is published on HireVue’s website but can only be read after signing a nondisclosure agreement.

Alex Engler, a fellow at the Brookings Institution who has studied AI hiring tools and who is familiar with both audits, believes Pymetrics’s is the better one: “There’s a big difference in the depths of the analysis that was enabled,” he says. But once again, neither audit addressed whether the products really help companies make better hiring choices. And both were funded by the companies being audited, which creates “a little bit of a risk of the auditor being influenced by the fact that this is a client,” says Kim.

For these reasons, critics say, voluntary audits aren’t enough. Data scientists and accountability experts are now pushing for broader regulation of AI hiring tools, as well as standards for auditing them.

Filling the gaps

Some of these measures are starting to pop up in the US. Back in 2019, Senators Cory Booker and Ron Wyden and Representative Yvette Clarke introduced the Algorithmic Accountability Act to make bias audits mandatory for any large companies using AI, though the bill has not been ratified.

Meanwhile, there’s some movement at the state level. The AI Video Interview Act in Illinois, which went into effect in January 2020, requires companies to tell candidates when they use AI in video interviews. Cities are taking action too—in Los Angeles, city council member Joe Buscaino proposed a fair hiring motion for automated systems in November.

The New York City bill in particular could serve as a model for cities and states nationwide. It would make annual audits mandatory for vendors of automated hiring tools. It would also require companies that use the tools to tell applicants which characteristics their system used to make a decision.

But the question of what those annual audits would actually look like remains open. For many experts, an audit along the lines of what Pymetrics did wouldn’t go very far in determining whether these systems discriminate, since that audit didn’t check for intersectionality or evaluate the tool’s ability to accurately measure the traits it claims to measure for people of different races and genders.

And many critics would like to see auditing done by the government instead of private companies, to avoid conflicts of interest. “There should be a preemptive regulation so that before you use any of these systems, the Equal Employment Opportunity Commission should need to review it and then license it,” says Frank Pasquale, a professor at Brooklyn Law School and an expert in algorithmic accountability. He has in mind a preapproval process for algorithmic hiring tools similar to what the Food and Drug Administration uses with drugs.

So far, the EEOC hasn’t even issued clear guidelines concerning hiring algorithms that are already in use. But things might start to change soon. In December, 10 senators sent a letter to the EEOC asking if it has the authority to start policing AI hiring systems to prevent discrimination against people of color, who have already been disproportionally affected by job losses during the pandemic.

Continue Reading


Gillmor Gang: Win Win



Just finished a Twitter Spaces session. It is an engaging platform, somewhat clunky in feature set but easily a tie overall with Clubhouse. I don’t see this as a horse race, however, more as cooperating teams fleshing out a platform where both will be major players. Like notifications in iOS and Android, the feature set is a push and pull motion where Android delivers deep functionality and Apple alternately pulls ahead and consolidates gains. Though the details can vary, the combined energy of effectively 100 percent of the consumer base mandates best practices and opportunities for innovation.

Something similar is going on in Washington as the Democrats test out their majority of none on the pandemic stimulus bill. The headline in the Times says bipartisanship is dead, but the subheading is the real story. The battle for control of the Senate is closing in on the arcane gerrymandering of the filibuster, or what passes for it after Republican whittling of the original talk ’til you drop croaking of Jimmy Stewart as in Mr. Smith Goes to Washington.

The telltale giveaway is Senator Lindsay Graham, who complains bitterly that the Democrats are steamrolling the COVID Rescue Bill without Republican votes “because they can.” The actual bipartisanship is between the progressives and moderates in the Democratic Party, as the Senator from West Virginia moderates one aspect of the bill to gain the prize of something the President can sign. Not only does it establish Biden’s power to govern but it also provides a roadmap for justifying the necessity of altering the filibuster equation.

Notice how Biden changed the subject from bipartisan negotiations to the power play it turned into. He used the polls to squeeze the Republican moderates where they fear most, the primary battles for control of the House in the midterms. The wave of vaccines are making it almost impossible to put up a political firewall; the anti-mask mandates seem like clueless floundering as people begin to have hope of an exit from the gridlock of partisan obstructionism. It will be hard to run on a platform of denial and death as we reach the end of May.

Governing by success undercuts the argument that government doesn’t work. Breaking the back of the filibuster requires the framing of the issue as finding a way to let government keep working in a bipartisan way. That brings us back to changing the definition of bipartisan as evidenced in the technology arena. In the Apple/Android example, two viable entities bring different strengths to insuring the ability to survive long enough to govern. Google’s lock on the network effect in advertising and “free” services may be challenged by Apple’s focus on privacy and a hardware revenue base, but the net effect is to cancel each other’s vulnerabilities due to the market force of their positions. The bipartisan finesse is that each platform has the other as a dominant customer.

In the same vein, Twitter v. Clubhouse is really not the point. Certainly we can cherrypick the battle as startup v. incumbent: Clubhouse filled with unicorn celebrities and rockstar investors and a builtin tension with the media, Twitter protectively fast following with its natural social graph advantages and struggling with scalability and the fear they’ve sown of abandoning projects before they can thrive. The question begged: what is the nature of the bipartisan compromise that will ensure both end up winners?

The answer is how to make each player the best customer of the other. Twitter’s problem is focus, and harnessing the power of users to hack the system to both theirs and the company’s advantage. The @mention spawned the retweet, providing the analytics that drive Twitter’s indelible social graph. Instagram may be Facebook’s best attempt so far at challenging the fundamental strategic value that the former president used to dominate, but Clubhouse promises to go one big step better with its hybrid of mainstream media and a Warholesque factory engine that creates new stars and the media they generate. This in turn migrates through the entertainment disruption led by the streaming realignment. What exactly is this NFT thing really about?

So Clubhouse has to open up its ability to multitask with Twitter and other curated social graphs. Facebook as a source for Clubhouse notifications and suggested conversations is different than Twitter’s But patching into the sharing icon on iOS will offer substantial access to blunt Twitter’s native integration in Spaces. On the flip side, Twitter’s Revue newsletter tools present an opportunity to mine the burgeoning newsletter surge, using its drag and drop tools to bring not just default social network citations but the implicit social graph of curated editorial rockstars. Not only is the influencer audience rich in signal for advertisers, but these same brands will prove most attractive to Clubhouse listeners looking for value. Win win.

from the Gillmor Gang Newsletter


The Gillmor Gang — Frank Radice, Michael Markman, Keith Teare, Denis Pombriant, Brent Leary and Steve Gillmor. Recorded live Friday, March 5, 2021.

Produced and directed by Tina Chase Gillmor @tinagillmor

@fradice, @mickeleh, @denispombriant, @kteare, @brentleary, @stevegillmor, @gillmorgang

Subscribe to the new Gillmor Gang Newsletter and join the backchannel here on Telegram.

The Gillmor Gang on Facebook … and here’s our sister show G3 on Facebook.

Continue Reading


The iMac Pro is being discontinued



Chalk this up to inevitability. The iMac Pro is soon to be no more. First noted by 9to5Mac, TechCrunch has since confirmed with Apple that the company will stop selling the all-in-one once the current stock is depleted.

One configuration of the desktop is still available through Apple’s site, listed as “While Supplies Last” and priced at $5,000. Some other versions can also still be found from third-party retailers, as well, if you’re so inclined.

The space gray version of the popular system was initially introduced in 2017, ahead of the company’s long-awaited revamp of the Mac Pro. Matthew called it a “love letter to developers” at the time, though that particular letter seems to have run its course.

Since then, Apple has revamped the standard iMac, focusing the 27-inch model at those same users. The company notes that the model is currently the most popular iMac among professional users. The system has essentially made the Pro mostly redundant, prefiguring its sunsetting. Of course, there’s also the new Mac Pro at the high end of Apple’s offerings.

And let us not forget that the Apple silicon-powered iMacs should be on the way, as well. Thus far the company has revamped the MacBook, MacBook Air and Mac Mini with its proprietary chips. New versions of the 21.5-inch and 27-inch desktop are rumored for arrival later this year, sporting a long-awaited redesign to boot.

Continue Reading


Investors still love software more than life



Welcome back to The TechCrunch Exchange, a weekly startups-and-markets newsletter. It’s broadly based on the daily column that appears on Extra Crunch, but free, and made for your weekend reading. Want it in your inbox every Saturday morning? Sign up here.

Ready? Let’s talk money, startups and spicy IPO rumors.

Despite some recent market volatility, the valuations that software companies have generally been able to command in recent quarters have been impressive. On Friday, we took a look into why that was the case, and where the valuations could be a bit more bubbly than others. Per a report written by few Battery Ventures investors, it stands to reason that the middle of the SaaS market could be where valuation inflation is at its peak.

Something to keep in mind if your startup’s growth rate is ticking lower. But today, instead of being an enormous bummer and making you worry, I have come with some historically notable data to show you how good modern software startups and their larger brethren have it today.

In case you are not 100% infatuated with tables, let me save you some time. In the upper right we can see that SaaS companies today that are growing at less than 10% yearly are trading for an average of 6.9x their next 12 months’ revenue.

Back in 2011, SaaS companies that were growing at 40% or more were trading at 6.0x their next 12 month’s revenue. Climate change, but for software valuations.

One more note from my chat with Battery. Its investor Brandon Gleklen riffed with The Exchange on the definition of ARR and its nuances in the modern market. As more SaaS companies swap traditional software-as-a-service pricing for its consumption-based equivalent, he declined to quibble on definitions of ARR, instead arguing that all that matters in software revenues is whether they are being retained and growing over the long term. This brings us to our next topic.

Consumption v. SaaS pricing

I’ve taken a number of earnings calls in the last few weeks with public software companies. One theme that’s come up time and again has been consumption pricing versus more traditional SaaS pricing. There is some data showing that consumption-priced software companies are trading at higher multiples than traditionally priced software companies, thanks to better-than-average retention numbers.

But there is more to the story than just that. Chatting with Fastly CEO Joshua Bixby after his company’s earnings report, we picked up an interesting and important market distinction between where consumption may be more attractive and where it may not be. Per Bixby, Fastly is seeing larger customers prefer consumption-based pricing because they can afford variability and prefer to have their bills tied more closely to revenue. Smaller customers, however, Bixby said, prefer SaaS billing because it has rock-solid predictability.

I brought the argument to Open View Partners Kyle Poyar, a venture denizen who has been writing on this topic for TechCrunch in recent weeks. He noted that in some cases the opposite can be true, that variably priced offerings can appeal to smaller companies because their developers can often test the product without making a large commitment.

So, perhaps we’re seeing the software market favoring SaaS pricing among smaller customers when they are certain of their need, and choosing consumption pricing when they want to experiment first. And larger companies, when their spend is tied to equivalent revenue changes, bias toward consumption pricing as well.

Evolution in SaaS pricing will be slow, and never complete. But folks really are thinking about it. Appian CEO Matt Calkins has a general pricing thesis that price should “hover” under value delivered. Asked about the consumption-versus-SaaS topic, he was a bit coy, but did note that he was not “entirely happy” with how pricing is executed today. He wants pricing that is a “better proxy for customer value,” though he declined to share much more.

If you aren’t thinking about this conversation and you run a startup, what’s up with that? More to come on this topic, including notes from an interview with the CEO of BigCommerce, who is betting on SaaS over the more consumption-driven Shopify.

Next Insurance, and its changing market

Next Insurance bought another company this week. This time it was AP Intego, which will bring integration into various payroll providers for the digital-first SMB insurance provider. Next Insurance should be familiar because TechCrunch has written about its growth a few times. The company doubled its premium run rate to $200 million in 2020, for example.

The AP Intego deal brings $185.1 million of active premium to Next Insurance, which means that the neo-insurance provider has grown sharply thus far in 2021, even without counting its organic expansion. But while the Next Insurance deal and the impending Hippo SPAC are neat notes from a hot private sector, insurtech has shed some of its public-market heat.

Stocks of public neo-insurance companies like Root, Lemonade and MetroMile have lost quite a lot of value in recent weeks. So, the exit landscape for companies like Next and Hippo — yet-private insurtech startups with lots of capital backing their rapid premium growth — is changing for the worse.

Hippo decided it will debut via a SPAC. But I doubt that Next Insurance will pursue a rapid ramp to the public markets until things smooth out. Not that it needs to go public quickly; it raised a quarter billion back in September of last year.

Various and Sundry

What else? Sisense, a $100 million ARR club member, hired a new CFO. So we expect them to go public inside the next four or five quarters.

And the following chart, which is via Deena Shakir of Lux Capital, via Nasdaq, via SPAC Alpha:



Continue Reading