Future Fountain logo

Brought to
you by:

Orrick logo NYU Future Labs logo

AI + Hiring: How to Help Ensure Your AI Makes Ethical and Unbiased Decisions
Hilke Schellmann, Emmy Award-winning reporter and journalism professor at New York University

With millions of job applications submitted to companies each year, vendors of AI software promise to democratize and speed up screening and hiring. But using machines is not without challenges. The potential damage from an algorithm that predicts the wrong outcome or excludes diverse candidates is significant. What can companies do to mitigate discrimination in AI hiring tools? In this episode, Orrick Employment Law Partner Lisa Lupion talks with Hilke Schellmann, an Emmy Award-winning reporter and journalism professor at New York University, about the promises and pitfalls of AI tools and how companies can mitigate risk in the workplace of the future.

Hilke Schellmann, Emmy Award-Winning Filmmaker, Freelance Reports and Journalism Profesor at NYU

Listen on Apple
Listen on Google Podcasts
Listen on Spotify

Show Notes

Lisa Lupion:

Welcome to the Future Fountain, a podcast series dedicated to conversations about the Tech Ecosystem, brought to you by Orrick and NYU Future Labs. I am Lisa Lupion, and today I am excited to be talking to Hilke Schellmann, Emmy Award-winning reporter and journalism professor at New York University. Through the use of innovative tools, Hilke focuses her reporting on unearthing systemic wrongdoing and its impact on vulnerable people. Her work has appeared in The New York Times, The Wall Street Journal, MIT Technology Review, PBS, National Geographic, and The Atlantic. Hilke is currently leading a project called, “Holding Hiring Algorithms Accountable and Creating New Tools for Humanistic Research,” which combines methods of investigative journalism, sociology, and data science to develop a new sociotechnical tool for critically investigating and auditing AI systems used in hiring.

Hilke, thank you so much for joining us today.

Hilke Schellmann:

Thank you for having me. It’s a pleasure.

Lisa:

So, artificial intelligence is not one thing. It’s a term that covers any technology where the machine or software is learning from the data it’s analyzing or tasks it performs. And it adapts its behavior based on what it learns from the data to improve its performance of certain tasks over time. So it’s learning using algorithms. So “AI” means a lot of different things in a lot of different contexts to a lot of different people. Maybe to get us started—what drew you to this field? How did you start looking into these issues?

Hilke:

I started looking into these issues absolutely by chance. I am a journalist by trade. I talk to a lot of people, and I was at a National Consumer Law Conference in Washington, D.C., and I needed a ride from the conference back to the train station to catch my train back to New York. I called the Lyft, and I was talking to the driver. This was in late 2017, and I chatted with the driver and I asked him, “How are you doing? How was your day?” And, at this time, you know, you usually get a “Yeah, I’m doing fine, whatever.” And this time the driver sort of hesitated and he was like, “You know, I had a really strange day.” And, I was like, “Oh really? Tell me more, interesting.” And then he said, “Well, I was interviewed by a robot for a job.” And I was like, “Really? Tell me more about this.” And then, you know, he just told me that he had applied for a baggage handler position at a local airport, and, you know, he got a phone call and it was this robot that asked him three questions. So I dutifully made a note in my notebook, and I was like, “That is so interesting,” and asked him if I could call him again. I got out, got on the train, and sort of forgot about the conversation for a couple of months until I was at an artificial intelligence and ethics AI conference a few months later. I think that was in February 2018. And I went to a panel, and it was hosted by someone who was very high up in the Equal Employment Opportunity Commission, which is sort of the regulator of hiring and employment in the United States. She had just left the agency. And she said, “I can’t sleep at night because we have hiring algorithms that measure people’s productivity at work.” She was afraid that these algorithms look for measures of productivity that may be discriminatory. So, she said, “Look, some of these AI tools, they look for absenteeism. So, if you look at like who is most absent from work, it’s two protected classes—mothers and people with disabilities.” And she was like, “That is discriminatory. So I am afraid that we are using these tools.” And, I was like, “Wow it’s really interesting.” And so then, the conversation with the Lyft driver and this conversation, I started to look into it and I found this whole world of AI tools that are checking facial expressions for job interviews, and I thought this was so fascinating that we could possibly, mathematically use artificial intelligence and math to predict people’s performance. What a fascinating topic.

Lisa:

Amazing that all started from a ride in a taxi. So it’s interesting how life takes paths forward in the most unexpected ways. So, why is it important to look into these issues, beyond the fact that it is just fascinating what we can accomplish now—but why do we need to look into it? Why do we need to study the field?

Hilke:

To me, with artificial intelligence, it’s really important to look at how we use artificial intelligence as a society. And I have done reporting on artificial intelligence used in policing and other areas. And I sort of feel like we especially need to take a look at when artificial intelligence is used for high-stakes decisions. How long should somebody go to jail? Should we police someone? Like, those are high-stakes decisions and, you know, obviously, if I go to jail for 3 years or 10 years or go to jail at all—that is a very high-stakes decision. But so— is like—are you going to get a loan? That’s a high-stakes decision. Your financial future might be dependent upon that. And so is having a job, having a career—it’s important. It’s a high-stakes decision. It’s our economic stability–can we put food on the table? Most of us have to work to make a living, so having a job or having a career is really, really important. And then also, it’s important for a lot of us because our jobs are tied to our identity, and we take a lot of gratification out of our jobs. We are all nervous before a job interview. We are nervous when we send out resumes. We can’t wait to get a call back from someone when we have a job interview, because it is a high-stakes decision! It changes our lives. And we are at work, I don’t know, 8 or 10 hours a day, so we spend a lot of time there. And everyone hopes that they’ll find a job that suits their personality, their skills, and so obviously employers also want to find the right people for the right jobs. And the idea is like, let’s see if AI can help us here.

Lisa:

Yeah, that’s great. It is a high-stakes decision for both parties. For the individual looking for their next great role and for the companies that want to invest in people. So, what are some ways that you’ve seen AI used in hiring practices?

Hilke:

AI is used in hiring in different ways. And sort of the underlying technologies—we all call them “AI” but they’re often slightly a little different—I first want to say that I absolutely understand that companies want to use AI for hiring because since like, sort of the dawn of job platforms like Indeed, LinkedIn, you know, everyone can apply to many, many jobs, so we’ve seen now employers get sometimes millions of job applications. Like, some of the big Fortune 500 companies they get millions of job applications every year. There’s obviously no way a human recruiter or even an army of human recruiters can look through all of these resumes, applications, cover letters. So there needs to be some sort of technology solution here. I understand that. And there’s a bunch of options now that AI vendors have put on the marketplace. We see artificial intelligence in resume screeners, or resume parsers a lot. That’s done when you send in your application to a company. They use what they call an application tracking system, internally. It’s a sophisticated spreadsheet so you can track all the applicants—they’re in one place together. Some of these have sort of advanced artificial intelligence capabilities and they can understand, “Okay, are these resumes—who’s a good fit for the job?” And it’s most often done by analyzing who’s been successful in the current job. So to understand who will be successful in the job, you look at your employees. Who are the people who are successful on the job, and you select, I don’t know, 50 or 100 people who are successful and then the machine looks at their resumes and it finds out, “Okay, what do these people have in common?” And then it benchmarks the incoming resumes against…sort of the prediction…sort of the AI that are found in these successful employees’ resumes. That’s probably the most common usage. A very similar usage of AI is inside these large job platforms. I’m going to say a couple names…because I think “job platforms” is a weird name…but, you know, where we go to apply for jobs. We use Indeed; we use LinkedIn; we use Monster. Those are gigantic platforms that help us find a job. So, they also use AI in two ways. They use AI to surface job ads toward us. So, if I’m a user and I go on one of these job platforms, I put up my profile. Their AI starts giving me job opportunities. So that’s one way. What opportunities am I being served? And then on the other side, there’s recruiters. They put in what kind of applicants they’re looking for and they get served a list of applicants. AI is used for that as well. So, AI is pretty predominant in these resume screeners. It’s not without problems, necessarily. What we have seen is that if a computer looks at the successful employees and looks at their resumes, it might find things that have nothing to do with the job, but it predicts success upon that. We think like, “Oh yeah, it’s a benign resume parser,” but there actually could be problems if you don’t supervise these systems.

Lisa:

Absolutely.

Hilke:

Yes, and there’s another case with the resume screener where an employment lawyer looked at a resume screener and found out that the two things the resume screener predicted upon was the name Jared and playing lacrosse in high school. So probably what happened here is just a statistical correlation and not a causation, right? There were probably a couple of Jareds who were successful on the job. The resume parser picked this up, and, you know, it’s not a person. It doesn’t understand first names have nothing to do with the job. It just saw, “Oh, five Jareds. Oh, must be a predicter of success.” And it turns out a lot of people apparently played lacrosse in high school who were successful in their job, so the resume parser picked that up as well. And when I talked to the employment lawyer—obviously first names have nothing to do with the qualifications. In this case playing lacrosse in high school also had nothing to do with the job application. So this was, unfortunately, a case where, you know, the resume screeners didn’t work as advertised.

Lisa:

It’s such an interesting situation because companies are looking to these resources to help with the mass volume…you get all these resumes and you want to be able to effectively manage all these resumes from diverse inputs, right? They’re coming from all over and you’re able to get more resumes—and that’s a great thing and it improves diversity. But if we’re then feeding the algorithm with information that’s going to replicate certain institutional parts of our business, then that might not be the same as increasing diversity, even though that’s part of the goal.

Hilke:

I think that’s really well said. And I think that’s one of the problems we see all the time. A lot of these AI tools—be they artificial intelligence-based games or resumes—they often build on the training data that is used as data from successful employees on the job. I mean, you have to use some kind of training data for the algorithm so, obviously, using successful employees might be a good idea, but the question is like, well, you might have historical discrimination in your successful employee base, right? Maybe you have only hired predominantly white males for the job so then probably the algorithm will pick up the preferences of those white males and will exclude people of color. Maybe women also, right? So that’s one of the big problems that we see in these systems. If you don’t have diverse data, it’s really hard to hire diverse applicants.

Lisa:

That’s exactly right and they’re…it might not be because of historical discrimination. Even if it’s just the labor market that the employer was drawing upon historically is narrower than the labor market they’re able to tap into today—because of all of these abilities to get resumes everywhere, right? We’re more remote; we’re more global—you’re able to get resources outside of where your office is located now, but that…you might be still replicating a non-diverse employee population if you’re using “success” based exclusively on the people you already have.

Hilke:

Yeah, and I think the problem is that we see that all the vendors try to eliminate discrimination in these tools. You know, they’re all acting in good faith. But what we’ve seen, for example, when we look at these big job platforms, is that they black out your names; they black out pronouns; they black out gender job descriptions like, for example, waiter or waitress, they block those kinds of things out so that the AI cannot pick up gender or other demographic ideas. But the problem that we see, actually, now that I’ve looked deeper into this, is that it’s actually the behavior on these platforms that gives us away. So even if we blackout all of this demographic data, it turns out men are…quote/unquote…more aggressive on these platforms. So, often, if they get contacted by a recruiter, overall, they answer more often than women. Obviously, there are some women that are more aggressive and some men that are less aggressive, but overall men are more aggressive. So the AI is not optimized to find the most qualified applicant; it’s optimized to find the most qualified applicants that are also going to apply for the job. So, you don’t want to find people like me, who are happy in their jobs who never answer recruiters and never apply for a job because they’re just happy, and I just happen to keep a profile on LinkedIn. That’s not the people the AI will recommend. So, the problem is, if men, on average, are more aggressive on these platforms, the AI will pick that up and will recommend more men to the recruiters because they’re more aggressive in answering and replying. So the problem is, even though we don’t have any demographic data, our behavior gives this away. So I think there is a lot of thought and a lot of supervision that we have to put into these tools to think through—how can we make sure we don’t replicate bias that we already see in the world out there.

Lisa:

You’ve obviously done a tremendous amount of important work studying these different types of technology—so, do you have any advice for companies that are looking at AI technology and trying to understand…how does a company get comfortable with “this makes sense for us”?

Hilke:

So, the advice I have for companies is: really scrutinize the problem. I think a lot of us—and I was sort of the same at the beginning of this journey, I felt like, “Wow, artificial intelligence, higher mathematics—I will never understand it, I’m just a poor little journalist.” Well, it turns out just asking skeptical questions is very helpful. So, there’s a saying in journalism, “If something is too good to be true—it’s probably too good to be true.” So, if somebody tries to sell you magic—that they can solve a problem we have never been able to solve with humans alone—why is the technology better at this? Well, let’s understand how it works. How does it select people? I think that’s one way to really really look through it. I think there are a couple of things that I think employers can really closely look at. Get the technical report. In the United States, all vendors have a technical report. If they don't have one, that is a very big red flag, because in the technical report they will tell potential folks that they want to work with…potential businesses they want to work with…how has this been tested? What is the predictability of how this is to work? And it will tell you like what was it sampled on? Was it sampled on 10 people? Well, you might want to ask a second question after that. Was it sampled on a thousand people? Okay, who were these people? Was this a diverse crowd who was sampled? Like, those are all kinds of things that I think are helpful for employers to scrutinize these systems. Also, working with an employment lawyer is really helpful to scrutinize this. And also work with maybe a management psychologist because they’re trained in the pre-AI usage of assessments and also often in AI as well. So, they can really dig deep and sort of understand: Are these methods really good? Was the sample size good? Should we really use this? Because as an employer, everyone thinks that if there is a lawsuit, the employer who uses these tools will be liable not the vendor, because the employer has bought it from the vendor and uses it…and the employer makes the decisions at the end of the day. So, I wish there was a data base for vendors, where they have to prove that the tool works. At the moment that is not a requirement in the United States. Technically, if I want to hire people, I could ask all of them if they are a Yankee or Mets fan, and if I don't like the Yankees I would only hire the Mets fans. The job may have nothing to do with baseball—or sports at all, but I can do that. I just have to make sure that I don’t discriminate against people of different races and different genders. So, if I have a diverse pool of people, no problem. I can do such a thing. I think that's one of the things that everyone is kind of surprised about—that we don’t really have the requirement that these tools work. I think we would like to see some evidence that these tools work. That the marketing language that these tools reduce discrimination and that they actually are not discriminatory at all—we would like vendors to back that claim up and put that in the public domain. And I also wish that people in companies would talk about when these tools did not work. I have talked to so many employment lawyers who have all signed non-disclosure agreements—they can’t name the vendors, they can’t name the companies, but they talk a lot about like, “Oh and I’ve seen this resume screen that didn’t work…we have seen this AI game that doesn’t work.” And I wish there was sort of a data base or something where this knowledge would be shared amongst companies, so we don’t always have to replicate the same mistakes. Or maybe use the same tools that are proven to not work again and again, because we would actually—that would actually cause real harm to people who did not get chosen for the job that they may be qualified for…that they might be a perfect employee for. And I think that’s something I always think about. Is, well—some people really have a dream job. They apply for the job, and if they don’t get chosen because the algorithm only thinks Jareds and people who play lacrosse in high school should be chosen that is really unfair and discriminatory. And I would understand that people who are looking for a job are pissed and upset if they wouldn’t be chosen.

Lisa:

Sure. What about the idea of—for the company to test it itself. So one thing that we sometimes think about is, “Okay, you want to try this, this sounds good, we looked at the technical report. It seems like it’s a good solution for this company. But before we use it as the exclusive resume screening tool or the test that we are going to use to get people in the door, we are going to use it for a three-month trial period and pilot the program.” What do you think of a piloting phase for companies?

Hilke:

I love it, especially if it’s in BETA. So, my hope is that there would be academics or people who work in HR who really take this seriously and do long-term studies. And hopefully put people in pile A and pile B and like, sort of, look at like, “Okay, let’s put them in the pilot of this resume screener and put them—the same people in our old-fashioned way of screening people. What are the results?” And really look at the differences in that—does it actually make sense? So my hope is that some companies would actually use maybe an AI system, but also actually use some of the people they—that their traditional hiring methods would have actually chosen but maybe the AI company said, “We don’t recommend this person.” Well, and let’s have some long-term studies, obviously more than one person, and understand: does the recommendation from the AI vendor really hold up? The more you can do in the background of this testing without harming people—meaning rejecting them for a job that they might be qualified—the better it is. And I feel like there isn’t enough of stress testing these systems, as I call it. I feel like we talk a lot, you know, in journalism, in the academic community, we talk a lot about sort of that ten-thousand-foot view from above…sort of, oh there could be these problems, or there could be these problems…but the reality is: Let’s find evidence of how these systems work—and maybe don’t work. So, I often sit down, you know, and I look at these tools and then I think through like, okay so how could I test these? And I use my own data to analyze it. So, one thing, for example, that’s pretty hot in the hiring world is the idea: Can we hire people for their personalities? Right? Because now business models change all the time, people need to be agile, right? I don't want to hire you because, you know, I don’t know, the python programming language now because you know maybe in three months we will use something else. I need you to be agile so you can train yourself and use the next big-thing-programming-language. So, I want to make sure I have agile employees. So resumes and applications and cover letters are only so helpful in that, it’s hard to find agility in that. So let me use some artificial intelligence-based games and see how agile you are. So…

Lisa:

…I think that the virtual games is really a fascinating component.

Hilke:

Mmm-hmm. (affirmative)

Lisa:

Unless, and I don’t know that any AI is particularly traditional yet, but the idea of using a game to test somebody or to screen someone for a role is new and different. So, what have you seen in that space?

Hilke:

So, the idea in artificial intelligence games is a really good idea behind it, right? You want to check people on their potential. So, I'm going to throw out the resumes. I'm not going to look at people’s backgrounds, right? Where did they go to school? Because, in all likelihood where we went, for example, to university has a lot to do with our socioeconomic background and not so much actually about our abilities. So, let’s toss all that away and let’s give everyone the same chance to succeed in these games. So, that’s sort of the idea. Also, a lot of applicants think it’s more fun to play a video game than answering 150 questions on a personality questionnaire. So the idea is: let me pull out your personality from these games. I mean, they are a little bit more like a retro 80’s and 90’s video games—they’re not like the new cool games that we have. But, you know, they’re more engaging. So, you play these games, for example, for one company you blow up balloons, and you have to bank money before the balloon explodes. So you sort of have to assess, like, how often can I pump up this balloon before it explodes, and how much money can I bank? So that game supposedly takes in our preferences for risk. So, how people criticize this is, you know, some people say, “Well, you might be a total daredevil in a video game, but are you really a risk taker at work?” Those can be two different, very different things. So, that’s not even clear that it measures my risk-taking ability at work, or if it just measures how much risk I am willing to take in a video game. That could be problematic. And another thing that I think a lot of job applicants get really upset about is that these games don’t actually feel like components of the job. So, if I play Tetris in a video game—does this really have to do with the job that I’m gonna do? And I think that’s where, sort of, there is a trust deficit here a little bit that a lot of job applicants feel like this may have nothing to do with job. It’s also…there is no underlying big theory that for accountants you have to be this kind of personality and you have to play the game this way. It's based on how successful employees play the game. So, the question there is, too, it’s like, okay if successful accountants play this game, maybe they’re all risk takers at this company. So, now I’m just gonna hire people who are also risk takers.

The question is: Does risk taking actually have to do anything with the job? Or is it just something that I have now assessed but has nothing to do with the job? And that is something that a lot of people feel like in hiring decisions we should only take in job-relevant information. We shouldn’t judge if you know…because otherwise you could look at like, “Oh, my successful employees, well turns out they all have brown hair.” Well, it turns out the majority of people have brown hair. Should you really, obviously…any human would know, I would never hire people with brown hair, but a computer might not. It might look at basketball players and find out “oh to be a successful basketball player you have to have a large shoe size.” So let me only hire people with a large shoe sizes. Well, it turns out they’re not all great basketball players. It has nothing to do with actually being a good basketball player. But it turns out that most basketball players have very large shoe sizes, so they are different than the general population. So, I think those are some of the problems that we really have to think through.

Lisa:

Yeah, you mentioned earlier the difference between correlation and causation and that seems like you have the same potential problem. You might not be measuring what would cause someone to have the right skills—from a legal perspective we talk about consistent with business necessity. You might not be measuring things that are actually consistent with business necessity, but there’s a correlation between people who have certain attributes and success on the job. And there’s an interesting question about whether we should be looking at that correlation—it strikes me that that’s not a good way to think about excluding people from a potential job because they don’t correlate with what other people have correlated—in terms of success.

Hilke:

I think that’s exactly what a lot of employment lawyers, and also, you know, sociologists—people look at the field feel the same way that. They call this the dust bowl of empiricism, where I just take big data and look at what do these successful employees have in common—but I am not looking necessarily at the job. Well, all of these data points that I find out on people, are they actually causally related with the job? And I think that’s where a lot of folks, for example, who have disabilities, really feel very strongly about this. Like one of the things is, for example, one of the AI games, one of the tasks is to hit the space bar as fast as you can. Well, someone who has a motor disability obviously might encounter problems doing this and might, you know, get a different personality assessment than actually their personality might be like. Or they might not be able to actually do this task. And then the question is what is hitting the space bar really fast have to do with any job? Does anyone have a job where that is actually required?

Lisa:

There’re not that many jobs where we’re being hired to push the space bar, so that’s probably fair.

Hilke:

So, these are all proxies, right? We use proxies to assess someone’s capability. We do that in job interviews, right? You don’t actually do the job—we ask you how you did the job before, and how you want to do the new job. So, I’m using a proxy. And the predictive value of proxies are only so much. So, I think what I’m most excited about…what I think would be great if we can work more on that…are virtual reality simulators—because the best predictions of the job is if you put somebody in the job. So, if you can use virtual reality to actually put a CEO in a stressful situation, in a press conference where they have to navigate sort of, I don’t know, like, the stock market crash and now everyone is asking them, like, how are you going to get the company back up. So, I think that would be a great way to test a potential CEO—in a real-world situation that they might encounter and understand, “Okay are there actually…would they be good at this?” Versus having them play maybe a game of Tetris or something that may or may not have anything to do with the job. So, I think that’s really at the core here, and I think that if folks who work in HR could really think and prod a little bit—like, is this actually causally related or is that just a correlation? That it turns out that people in that company are really all risk takers. But does it really have to do with the job, or is it just maybe company culture? And we always talk about like, fit, and we have used AI for cultural fit, and it has really backfired because it really has shown that it leads to hiring non-diverse people. Usually, like the people you already have. And that’s really not what we should be doing.

Lisa:

So, a few times you mentioned the role of the HR team or supervision of the AI technology, or you know, doing something that humans alone can’t do. And in each of those answers there was, you know, you implied that we should not be hiring individuals exclusively using the AI hiring tool—whichever it is. So where do you see the role of AI versus the human resources team—or the actual people who work at the company? And how do you see those two fitting together?

Hilke:

I think if AI and humans could work together, I think that would be really great. I think it shouldn’t be either/or, like, there needs to be supervision of AI. And, at the same time, I want to make sure that people don’t take away that I think we should go back to human hiring. I don’t think human hiring is great, either! So, I don’t think human hiring has been—has a great track record. So, for example, human hiring—there’s obviously a lot of human biases that humans have. So there’s this thing: “the son I never had.” That’s a hiring thing. You meet a young person and they just—you know, they’re so full of life and maybe they have something in common and you see yourself in them. And then you hire them for their potential—and maybe they’re not the most qualified people. They just tug at your heartstrings or whatever. You know, we also know this from job interviews, that somebody comes in, they went to the same school as we did, they come from the same hometown, we have something in common. And studies have shown that humans then prefer those people who are maybe not as well qualified as other applicants who just look different than we do, speak differently, maybe have a different background. So human biases have not led to more fairness in hiring. And what’s true is that AI takes out these human biases. AI doesn’t have a preference for “the son it never had.” But AI also brings in other biases that we have to be careful of, right? It’s in the training data—how are the successful employees that I choose to build the training data—how diverse are they? How have they been chosen? Are they really successful, or did a manager just like them? There are a lot of things—are they diverse? Are people with disabilities are represented there? There are a lot of things to think about. Also, what I’m concerned about is the scope of AI hiring tools is just way more vast than the damage one human hiring manager can inflict, right? Like, if I’m one big company and I screen millions of people through one algorithm, and it turns out that it predicts on the wrong thing or it excludes large amounts of women or people of color, that—the damage that I could’ve caused—is just so much more than one human hiring manager could. So, I know the future of hiring will need artificial intelligence. We need technological solutions, but we just have to clearly supervise them and make sure they don’t bring new discrimination into the fields.

Lisa:

Sure, so we’re eliminating one bias but trading it in for another is not really what we’re aiming for by using AI.

Hilke:

I would argue that we’re not helpless, right? Like, we know we have human bias and then we introduce other bias. In this case, computational bias or training data bias. That’s not helpful, either. So, I think we want to—if we bring in a new system, well, let’s make sure it works and it really doesn’t discriminate. And it really predicts on the things that are relevant to the job and not on some random things that a computer found out—because we now live in the age of big data, and I have a lot of data on everyone and I can look at like, “Okay, what do these people have in common?” And maybe that I couldn’t see before, but is that data actually relevant? And I think that’s often missing. When we just look at the data and we see, “Oh, these are the three things that people have in common. We should hire those people.” But we never question, “Wait a second, is this actually relevant to the job or did a computer just find this out by looking at a large amount of data?” So, I think if HR folks, employment lawyers, could scrutinize these systems, I think they will find that some of them work better than others.

And if we could somehow publicize that data and have a public database, we know which tools work, and which don’t. Obviously helpful, right? We learn from that, and we don’t have to use tools that don’t work again and again, and actually, possibly, keep harming people.

Lisa:

Yeah…It seems that some tools might work for some populations and might not have the same effectiveness with other populations. So, there might be a tool that’s very successful at predicting success for a particular type of function—and be much less successful in identifying candidates that would be successful in a different type of role or in a different type of company. And so I do think that there’s not a one-size-fits-all type approach. Just like the technology is different, the companies are different, and the roles are different, and the success might be different, as well.

Hilke:

Yeah. I think what’s interesting to me is, actually—I feel like the industry’s actually going—or it’s less, “Let’s figure out the specifics for this particular job, build an algorithm for this particular job at this particular company, because it might be different than this other company.” It’s actually more of an off-the-shelf approach now, which is—saves a lot of money. But the question is: are all call center jobs really the same? Are all jobs at universities really the same? I think those are real questions that we have to look at. And I think companies—they do want to save money. They want to have a more efficient way of hiring…totally understand that. They just also have to make sure that the tools work and don’t bring in new biases—even if they eliminate the human biases.

Lisa:

And for the off-the-shelf type of tool, we talked a lot about the sampling or the testing, and we talked about some of the programs where you’re actually predicting success based on people who were already successful. So, are individual companies able to feed in their own version of success, or is it some general version of success that the AI tool company has identified?

Hilke:

It’s both possible. So, some companies—if you’re a very large company and you really feel like, “I’m happy to spend some money here, and I want to do it based on the data that’s particular to my company.” But I think now vendors feel like, “You know what? We have ten million people in our database who have done this kind of job. We’re very confident in saying that you can use this off the shelf.” I think it’s just a matter of how much you want to scrutinize the technology that you buy…scrutinizing the methods that are underneath these technologies is really helpful.

One thing that’s become very popular during the pandemic are one-way video interviews, meaning I get prerecorded questions on my computer, and I answer them with no human on the other side. So, it’s basically sort of a video presentation or an audio presentation. What’s good about these tools, or these interviews, is they’re structured. Everyone gets the same question, so there’s no, “Oh, we went to the same school. Great, I liked you better.” Right? So that can happen. So that’s great. We know that leads to less discrimination. The problem with some of these methods is if you use AI to scrutinize the answers, well, unless the technology actually understands what a good answer for this question is, its natural language processing actually can objectively score people.

Some of these tools also use—analyze our voice, our facial expressions, and there the science gets dicier and dicier. What science is underneath it? Do we actually know what facial expressions we need for a given job? You know, maybe in a customer-facing role you might need to smile, but what kind of facial expressions do you actually need in most jobs? We don’t know that. And is that actually relevant for the job, right? Same question. Same with intonation of voices, you know. Can we really infer emotion there? Do we know if someone’s really sad? We maybe get close with a guess—but is that enough to make an employment decision?

Lisa:

And does it matter?

Hilke:

Yeah.

Lisa:

…right? Do we really care? There could be people who are very effective at their job but they’re not big smilers. Or other people who smile all the time and are not that good at their job. And so, I would imagine if you asked most managers, “Do you need an employee who smiles?” In most roles, the answer could be “no” to that. I don’t think many managers have thought of that as an important criteria to hiring decisions.

Hilke:

But they might fall into the trap where they feel like, “Well a computer has now analyzed hundreds of my successful people and it turns out that they most often smile when they answer this question in a job interview—so maybe this is relevant. What do I know, right? Like, how can I argue with big data?” But here’s where you really need to come in and scrutinize these methods, and feel like, “Wait a second. Does this actually make sense? And do we actually know what the AI scores upon?” So, the problem is not only that some of the big data scoring may or may not be job relevant. The other question is also: What is the AI actually scoring upon? And a lot of vendors don’t know this, because these are black box algorithm—if they use some deep learning method, right? We don’t actually know what the computer scores upon. So, in a hiring decision, you really want to think very hard about this as an employer. Like, what happens if you need to go to court and a judge asks you, “Why was this person chosen or not chosen?” And you have to say, “I don’t know?” I’m not sure if that’s a good answer you want to….

Lisa:

I’m pretty sure that…

Hilke:

….have to say in a court of law.

Lisa:

…is not a good answer in a court of law. But I do think you made a very important point that we should be challenging that. We shouldn’t just blindly follow any of these tools, and we should really be thinking critically about what is the tool measuring and how does that fit with our business needs—as opposed to just saying, “If the data says it, then we must assume it.”

Hilke:

Yeah, and what does it say about our society? Do we—is this something we want to use unquestionably? And I think that comes in when I talk to people with disabilities—that there’s a real question there, right? Is like—are people with disabilities actually represented in the training data, first of all. Often, they face a much higher rate of unemployment. So, possibly, how many folks with disabilities are actually represented in the training data, first question.

Disabilities are expressed in many different ways, right? I might be autistic. Somebody else might be, as well. But our—the way our disability might be expressed on such an individual level, that that actually contradicts the idea of an AI-statistical analysis versus a personal problem, right? People with disabilities are sort of at odds with these AI systems, and that’s where a lot of tension is and I think there might be a lot of lawsuits coming down the pike from that end, because it feels like often, for people with disabilities, this is not fair. Why am I being judged with a system that my training data—you know, I’m not represented in the training data. I don’t even know what the system has asked me to do. And some of these AI games—what does it have to do—why do I have to do this? Is this job relevant? And I think there’s a lot of questions there, and I think that we probably will see some litigation in that field.

Lisa:

A lot of it will go back to whether there really is a business necessity or an essential business function that's being measured by the AI in the tool that we're utilizing. So, with all the problems and potential problems with AI-assisted recruiting, are you optimistic about the future of using these tools in the recruiting process?

Hilke:

I am optimistic that the technology can only get better. Right now, I feel like we’re sort of in this era where we are building the plane while we’re flying it, because this usage of AI in recruitment—or in employment in general—is just so new, so we don’t have standards yet. We don’t have clear laws that set a standard, so there’s all these different vendors that try out a lot of things, and some of it works really well, some of it’s not as good as one might hope. So, I think there’s a lot of what people would call the Wild West because there’s a lot of tools out there that are being used—we don’t know much about them. I think they can only get better, but it has to get better with some regulation, and I hope there will be more public debate around this and there will be some sort of database that companies have to put their algorithms in and have to prove that these algorithms work. I think that would be really helpful. The more transparency we have, the better in the field. And I don’t think we have a choice to go back. Human hiring is not the way to go, so let’s build a system that uses AI and is supervised by humans, and make sure it works and it doesn’t discriminate.

Lisa:

Great!

Hilke:

Cool, awesome! Thank you for having me.