Citizen Data Scientists: 5 Ways To Harness Talent
A new role is emerging to deal with the ongoing shortage of data scientists. Learn more about these new power users and find out how organizations can cultivate more of them.
The worldwide shortage of
data scientists won't end anytime soon. To try to compensate for the shortage, data discovery solutions are automating tasks that have traditionally been done manually by a data scientist, statistician, or other analytics expert. The confluence of trends is giving rise to a new role that Gartner calls a "citizen data scientist."
A
recent Gartner report defines a citizen data scientist as "a person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics and analytics." It could be a line-of-business role, a business analyst, or a member of the business intelligence or IT team. The defining trait is that statistics and analytics are secondary in the role.
Not everyone in an organization will become a citizen data scientist -- at least by Gartner's definition. By that standard citizen data scientists are power users. The new role does not threaten those of data scientists, data analysts, or business analysts; it complements them. And in fact, citizen data scientists necessarily have to work with other roles to derive the most value from analytics.
Like anyone else in an organization, citizen data scientists need the right technology to do their jobs. In this case, that's one of data-discovery offerings that automate parts of complex processes such as data preparation and pattern identification.
As advanced analytics capabilities become available to more people, companies will have to ensure they have the governance in place to make it work, which includes software enforcement of governance policies. According to Gartner, by 2018 the multiple styles of data discovery available today -- smart, governed, Hadoop-based, search-based, visual-based, and graph-based -- will converge as their unique capabilities become requirements. And the convergence is already under way.
Click through the following pages to learn seven ways companies can prepare for the coming wave of citizen data scientists.
Embrace Automation
Organizations are becoming more agile. To accomplish this, they need to automate certain tasks and processes that have historically been done manually. With automation, organizations are able to achieve dramatically higher levels of scale and speed. And, in the case of data-discovery platforms and solutions, the automation can lead to insights that might not have been uncovered otherwise.
"We're seeing the beginning of automation in each of the components. As we discussed in the
report, there are a number of specialist vendors focusing on the data preparation, automating the pattern detection, and [enabling the use of] natural language. Some others are beginning to offer pieces of all of them," Rita Sallam, a research VP at Gartner, said in an interview. "I think we have the beginning of a next-generation set of capabilities."
Automation does not completely remove humans from the equation, however. It speeds and simplifies what has historically been time-consuming and difficult.
"I don't think you'll ever fully automate the job of an analyst, but what you'll be able to do is automate enough on the data preparation side of things [to improve] the time, cost, and accuracy of preparing your data, which is a big problem for data discovery," Sallam said. "As we can automate finding patterns more, we'll reduce the time it takes to build a production-grade model, so, to the extent we can automate some of the exploration, data scientists can find things that are significant."
Explain Data Visualizations
Data visualizations are becoming more sophisticated to accommodate increasingly complex data. Some people misinterpret or don't understand certain types of data visualizations because they lack the knowledge to understand them, or because their minds comprehend other types of data visualizations better than the ones that have been presented. Analytics vendors provide a variety of data visualization options to accommodate the differences in datasets, the type of analysis being done, individual perceptions, and the need to verify results.
"Data visualizations are an improvement over tables and lists, but they can obscure what's really significant, or what's really causation or not," Sallam said. "Someone looking at a data visualization created by a business analyst or data scientist may not really have the skills to fully interpret a complex chart or even a basic chart. Even the meaning of bar charts and pie charts isn't necessarily clear to everyone."
If the goal of data discovery is to expand insights to the maximum number of people in an enterprise to drive value, then it is important to make sure that a broader audience can more accurately interpret the data visualizations that are presented to them. Adding simple narratives can help improve insight and comprehension.
Arijit Sengupta, CEO of BeyondCore, recommends explaining data visualizations with detailed stories in addition to providing in-graph text clarifications.
"There's a broader story than explaining the graphs. It's explaining the relationships of the graphs -- the overall story,” said Sengupta. "You can't just give [a business user] the answer; you have to explain the answer in terms a business user can understand."
Educate Them
Universities are expanding their data science and analytics programs to better align with what's happening in the real world. They're offering new executive education programs, MBA classes, and undergraduate classes aimed at people whose career focus is not primarily data science, machine learning, or statistics. The primary goal is to educate business leaders and line-of-business managers so they can use data and work with data science teams more effectively.
"I'm not sure [citizen data scientists] need to take masters-level classes in statistics and machine learning. In fact, on the contrary, they may just need to be trained in sort of basic statistics," says Gartner research VP Rita Sallam. "It's more of training on how you would work with a data scientist to make sure that we're not misinterpreting findings in the data. That is really part of a broader process to automate the exploration phase and push forward those hypotheses that truly need further investigation by a specialist."
Instead of just investing in two-week training by a vendor, which may also be wise, citizen data scientists' skill sets need to be enhanced on an ongoing basis to get the most value out of the tools and approaches their companies have adopted -- which continue to evolve. As Sallam notes, they must learn to work cooperatively with the data science team.
"As companies shift to more of the self-service model for analytics, whether it be visual-based discovery or citizen data science, the need for training and agile training methods increases. The idea for BI in the past, where we had a competence center centralized in IT, and IT was building content, [it was probably OK to give] those developers vendor training and then 'off you go.' Now … we're trying to enable more regular users to incorporate analytics as part of their job, meaning their primary job is not analytics, but they use analytics as a tool."
According to McKinsey,
there is a shortage of 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data. Training is not going to solve the problem, according to BeyondCore CEO Arijit Sengupta.
"You need the technology to be so simple that everyone can use it," says Sengupta. "Analytics is so important to society, it can't be something that's the domain of experts."
Start Small
The citizen data scientist role will gain traction over time because the role is new, and not everyone is (or cares to become) a power user. Gartner recommends a slow approach.
"I think companies have to start small. It makes sense for them, and they see how this would fit in the continuum of analytics that they provide, from basic discovery to data science," Rita Sallam explained. "We absolutely recommend using that as an opportunity to build trust in what is essentially a black box, because that's often what companies and people in general have an issue with. I think you're going to have to pilot, you're going to have to try. You're going to have missteps, and it will take time to build trust in this kind of capability."
The good news is that the results of a black box calculation usually can be exported by a more advanced user to another tool for verification
Have Governance In Place
Effective data governance is essential. As more users access data and use analytics, organizations need to ensure they have appropriate governance in place that is enforced by software.
"As more business users gain the ability to run calculations and discover findings themselves, you run the risk of people using the same data and coming up with different results," according to Sallam. "You want to allow more sophisticated users to go in and make sure [the results are trustworthy], but then there are governance processes on the data side [that define] who can access what data, who can promote models, and who can share models without having them be checked by a more sophisticated data scientist."
Sallam says a lot of companies have an internal certification program that provides training on processes and tools as well as the rules for usage.
"We usually suggest creating an internal program that business users are required to take in exchange for access to the tool. Companies are struggling with governance as more people in the company use advanced analytics. As we're shifting, as we're going through this major shift from analytics being IT-centric to business-centric, governance is the biggest challenge."
There is often resistance to governance among line-of-business staff members who believe fast access to data will be impeded by governance. They have to understand the need for governance and why it is important. Enforcing governance through software is an effective way -- but not the only way -- to improve compliance.
"Every analysis, every step of the analysis, every interaction with it should be logged and stored," said BeyondCore CEO Arijit Sengupta. "[That way, if another] party asks if you included a variable, you should be able to see what was included and rerun the analysis. Governance has to be enforced by the software, not by the people running around trying to do things."