Contributors, Latest News, Vaageesha Das

Why is data science such a significant field?

IN RECENT YEARS, WE’VE SEEN A HUGE GROWTH IN THE AMOUNT OF DATA AVAILABLE TO US. BUT JUST HAVING DATA ISN’T USEFUL — we want to know the significance ofwhat the data can tell us. It can help us understand what happened, why it happened, what will happen and what can be done with the results.

Data science involves extracting knowledge and insights from data and eventually turning thoseinsights into actions that businesses or organizations can use.

Data science is the intersection of computer science, mathematics and business expertise. Business analysts are responsible for domain expertise; they have a deep understanding of the business.

They also visualize insights. Data engineers deal with data mining, cleaning and exploration. Data scientists do data exploration and use machine learning and visualization tools. There is a lot of overlap between these roles, leading to collaboration.

Businesses use data science for descriptive analytics (involves having accurate data collection to understand what is happening within the business), diagnostic analytics (why something happens, such as why sales went up or down, involves drilling down to the root cause), predictive analytics (what is likely to happen next, such as what will the sales performance be next quarter, involves using the historical patterns in data to predict outcomes in the future) and prescriptive analytics (what is the recommended best action for the desired outcome, such as figuring out what needs to be done to improve sales by 10%).

The data science life cycle begins with understanding the business. We want to make sure we are asking the right question.

Business expertise and domain expertise play a critical role here. Next, we do data mining. We go across the landscape and procure the data we need.

We then have to do some data cleaning. When we find data, it might initially not be in the best format. It might contain duplicate rows or other unnecessary aspects. We must do data cleaning before it is ready for analysis. This is where the exploration aspect comes into play: We can use advanced tools like machine learning, where we leverage massive computing power and large amounts of high-quality data to make predictions.

We can visualize insights and outcomes of analysis.

One class of machine learning used by many companies is recommender systems. This is when data is used to help predict what people are looking for when given a growing number of options. For example, Netflix uses this to recommend shows/movies to its users. The more people watch what theyre being recommended, the better the respective recommendation is. The recommendation can be made based on many factors, such as the persons watch history, demographics, etc. Additionally, the better the recommendations, the more likely that the person is going to continue using the business products, which then leads to the business getting more money.

Random forest is a machine-learning algorithm made up of decision trees. Decision trees begin by asking basic questions, such as Should I accept this job offer?From there, there can be questions about the salary, location, role, etc. Each question is a decision node in the tree and is a way to split. The tree helps the individual arrive at a final decision. The more decision trees used with different criteria, the better our random forest will perform, which increases our prediction accuracy. Decision trees themselves can be prone to issues with bias and overfitting (learning too much from the training data and failing to generalize to new data), but having multiple of them within a random forest reduces this problem, especially when the trees themselves are uncorrelated.

Data science provides insights into otherwise overwhelming data that can then be used to understand the complexities of the world.

VAAGEESHA DAS is a second year college student and columnist for The Dominion Post. Information comes from: aws.(n.d.). What is Data Science? – data science explained – AWS. aws. https://tinyurl.com/dsover; Berkeley School of Information. (n.d.). What is Data Science? Berkeley School of Information. https://tinyurl.com/dsberk; IBM Technology. (2022, June 13). What is Data Science? YouTube. https://tinyurl.com/dscilifecycle; IBM. (n.d.). What is Random Forest? IBM. https://tinyurl.com/dsrandforest; What is a recommendation system? NVIDIA Data Science Glossary. (n.d.).https://tinyurl.com/nvidiarecsys