The term Data Science may have been invented in academia, but the proliferation of its applications has been driven mostly by the tech industry. The term became popular since recruiters required to describe more precisely what they needed for data-driven initiatives new sort of project becoming more and more common. Advanced education in Computer Science and Statistics does not guarantee the expertise required to complete such projects successfully. Programming skills and experience in analyzing complex datasets were necessary. However, as one can acquire a Ph.D. degree in Statistics without ever dealing with a real dataset statistician was not a specific enough job role. Also, since one can get a Ph.D. degree in Computer Science without ever writing a code, computer scientist was not specific enough as well. Statisticians and computer scientists can prove to be beneficial for your organization, but not always.
Furthermore, some graduates from other fields like social sciences and physics had adequate experience in managing and evaluating data to be hired. Therefore, the credentials provided by academic institutions did not offer a useful signal to employers. Though, the academic knowledge provided by Computer Science and Statistics was essential but wasn’t enough. The term data scientist hence, became useful for drawing distinction between, for instance, someone with experience in analyzing data in its messy form versus someone who can prove an estimation is asymptotically usual or someone who knows how to write fast, efficient, and reliable code to extract or insert data from a database versus someone who can prove if an algorithm is incomplete.
However, since the challenges faced by data-driven organizations vary significantly across different enterprises, and even within an enterprise, the term remains quite ambiguous. As a result, the best definition we can arrive at is that “data science is an umbrella term used by organizations to describe the processes used to extract value from data.”
The data science areas of expertise
Data science is divided into two categories: back-end and front-end data science. The back-end is the part that deals with efficient computing, hardware, and data storage infrastructure and the front end is as the part geared more towards data analysis and can be further divided into applied machine learners and data analysts. Data analysts explore, wrangle, quality assess, and fit models to data. The applied machine learners make and measure prediction algorithms. Domain knowledge is, of course, essential for both these tasks. Often, to finish the project the front-end data scientist develops a prototype that the back-end data scientists convert into a robust pipeline. As a result, front-end data scientist tend to use R or Python, while back-end data scientist programs in low-level languages such as C++ and database languages like SQL.
The implication for academic programs
Having the aim of training an individual to be a professional who can tackle all the challenges involved in the data science procedure is too ambitious. Though, as the term Data Science became more and more popular, the demand for Data Science education improved consequently. Universities hurried out to figure out how to meet this demand. Developing revenue generating postgraduate programs was the priority and, as a result, today we have several universities offering these degrees. However, what precisely are these students being prepared to do? What do these new educational programs offer than usual ones did not? Given that, with some exemptions, no new faculty were hired when developing these new programs, and, in numerous cases, no new classes were developed, it is not clear that a postgraduate degree in Data Science offers the signal employers are looking for.
Evidently, existing academic programs provide excellent ways of acquiring some of the expertise as discussed above. These comprise courses on probability, discrete math, statistical inference and modeling, software engineering principles, computer programming, and machine learning. However, it was right before Data Science programs evolved. So what can the academic world do to better prepare students from the data science workforce and deliver a better signal to the industry? Here are some recommendations:
- Understand that Data Science is an umbrella term and provide specific tracks directed toward the different facets of data science.
- Adapting machine learning and statistics course to have applications in the forefront rather than a theoretical focus.
- Give learning experiences that expose learners to long-term projects like those they will be assigned in the industry. For this, various universities will have to invest in new faculty, with practical experience.
Data science certification from a quality institution can give students an excellent insight into the subject matter with practical experience in the domain. Top institutions have state-of-the-art resources and experienced faculty to facilitate students learning.