Words matter: AI can predict salaries based on the text of online job postings

Words matter: AI can predict salaries based on the text of online job postings

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!



The job landscape in the United States is dramatically shifting: The COVID-19 pandemic has redefined essential work and moved workers out of the office. New technologies are transforming the nature of many occupations. Globalization continues to push jobs to new locations. And climate change concerns are adding jobs in the alternative energy sector while cutting them from the fossil fuel industry. 

Amid this workplace turmoil, workers, as well as employers and policymakers, could benefit from understanding which job characteristics lead to higher wages and mobility, says Sarah Bana, a postdoctoral fellow at Stanford’s Digital Economy Lab, part of the Stanford Institute for Human-Centered Artificial Intelligence. And, she notes, there now exists a large dataset that might help provide that understanding: the text of millions of online job postings. 

“Online data provides us with a tremendous opportunity to measure what matters,” she says.

Indeed, using artificial intelligence (AI) and machine learning, Bana recently showed that the words used in a dataset of more than one million online job postings explain 87% of the variation in salaries across a vast proportion of the labor market. It’s the first work to use such a large dataset of postings and to look at the relationship between postings and salaries. 

Bana also experimented with injecting new text – adding a skill certificate, for example – into relevant job listings to see how these words changed the salary prediction.

“It turns out that we can use the text of job listings to evaluate the salary-relevant characteristics of jobs in close-to real time,” Bana says. “This information could make applying for jobs more transparent and improve our approach to workforce education and training.”

An AI dataset of 1 million job postings 

To analyze how the text of online job postings relates to salaries, Bana obtained more than one million pre-pandemic job postings from Greenwich.HR, which aggregates millions of job postings from online job board platforms. 

She then used BERT, one of the most advanced natural language processing (NLP) models available, to train an NLP model using the text of more than 800,000 of the job postings and their associated salary data. When she tested the model using the remaining 200,000 job listings, it accurately predicted the associated salaries 87% of the time. By comparison, using only the job postings’ job titles and geographic locations yielded accurate predictions just 69% of the time.

In follow-up work, Bana will attempt to characterize the contribution of various words to the salary prediction. “Ideally, we will color words within postings from red to green, where the darker red words are linked with lower salary and the darker green are linked with higher salary,” she says. 

The value of upskilling: A text-injection experiment

To identify which skills matter for salary prediction, Bana used a text-injection approach: To certain relevant job postings, she added short phrases indicating the job requires a particular career certification, such as those listed in Indeed.com’s 10 In-Demand Career Certifications (And How To Achieve Them). Obtaining these certifications can be costly, with prices ranging from about $225 to about $2,000. But, until now, there has been no way to determine whether the investment is worthwhile from a salary point of view. 

Bana’s experiment revealed that some certifications (such as the IIBA Agile Analysis Certification) produce meaningful salary gains quickly while others (such as the Cisco Certified Internetwork Expert) do so more slowly – valuable information for workers who would like to have better information about how an investment in skills training will affect their salaries and prospects, Bana says.

Employees aren’t the only ones to benefit from this information, Bana notes. Employers can use these results to better invest in human capital, she says. If, for example, machine learning models reveal a gradual shift away from some tasks and toward others, employers would have advance warning and could retrain certain employees.

And policymakers considering what job training programs to promote would similarly benefit from understanding which skills are waxing or waning in economic value.

To that end, Bana and her colleagues are currently working on a companion paper that identifies what tasks are disappearing from job listings over time and what new tasks are appearing. 

In the future, Bana hopes that textual analysis of job postings could yield a web-based application where workers or companies could research the value added by upskilling or by moving to a new geographic location. 

“Currently there’s not a lot of clarity around a path to higher earnings,” Bana says. “Tools like these could help job seekers improve their job prospects, employers develop their workforces, and policymakers respond to immediate changes in the economy.”

Katharine Miller is a contributing writer for the Stanford Institute for Human-Centered AI.


This story originally appeared on Hai.stanford.edu. Copyright 2022


DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers