How to learn Data Science 2020 yourself (2023)

Part 1: SQL, Python, R and data visualization

I recently graduated with a degree in chemical engineering and landed my first job as a data analyst at a technology company. I documented my trip here.from chemical engineering to data science. Since then, many who have spoken to students at my school about the change have expressed the same interest and doubts...

"How did you get from engineering to data science?"

This is exactly the question I asked myself: How can I dive? The same thought drove him forward and prompted him to start building the skills of a data scientist just over a year ago.

It certainly wasn't the lack of information that hampered the investigation. On the contrary, the deluge of resources to learn data science makes it difficult to separate the best resources from the average.

How to learn Data Science 2020 yourself (1)

But first, let's understand...

Ah, that's a difficult question to answer that baffles hiring managers and interviewees alike. In fact, different companies define data science differently, making the term ambiguous and somewhat elusive. Some say it's about programming, some say it's about math, while others say it's about understanding data. It turns out that they are all somewhat correct. For me, the definition I agree with the most is this:

Data science is the interdisciplinary field that uses techniques and theories from mathematics, computer science and domain knowledge. [1]

How to learn Data Science 2020 yourself (2)
(Video) How to learn data science in 2022 (the minimize effort maximize outcome way)

This is what data science looks like in an image to me. I have blurred the boundaries between each knowledge segment to demonstrate my impression that knowledge from each of these areas is combined to form what is known as "data science".

In this series of blog posts, I want to highlight some of the courses I've taken on my journey along with their pros and cons. With this, I hope to help people who have been in my position to plan their journey to self-study in data science. These publications are:

  • Part 1: Data Processing with SQL, Python and R (Here it is!)
  • Part 2 — Math, Probability and Statistics
  • Part 3: Basics of computer science
  • Part 4 -Machine Learning (read here!)

In this post I will highlight how I found out about itdata processingnecessary knowledge of a data scientist. In order to process data, you usually have to learn how

  1. Extract data from a database using SQL (standard query language) and
  2. Clean, manipulate, analyze data (usually with Python and/or R)
  3. Visualize data effectively.

SQL is the language for communicating with a database that holds data. If data is treasure buried underground, SQL is the shovel to unearth the treasure's raw form. More specifically, it allows extracting information from one or a combination of several database tables.

How to learn Data Science 2020 yourself (3)

There are many different "flavors" of SQL such as SQL Server, PostgreSQL, Oracle, MySQL and SQLite. They each differ slightly, but the syntax is still very similar and you don't have to worry about what kind of SQL you're learning.

To learn a language, first learn the words before combining them into sentences and then paragraphs. The same applies to SQL.

To learn the basics (the SQL words or phrases) I usedDatacamp (Introduction to SQL)and data search (SQL basics). (I'll talk more about Datacamp and Dataquest later.) These sites usually refresh basic SQL skills with instructive exercises and examples. Some of the concepts covered are:

  • SELECT and WHERE to filter and select
  • COUNT, SUM, MAX, GROUP BY, HAVING to aggregate data
  • DISTINCT, COUNT DISTINCT to create different useful lists and different aggregates
  • OUTER (e.g. LEFT) and INNER JOIN when/where to use them
  • Strings and time conversions
  • UNION and UNION OF ALL.

(You may not know that you know this, but that's okay! This is just a list of things you can expect to learn.)

However, the opportunity to do these exercises did not adequately prepare me as an analyst. He could understand words and sentences, but he was far from able to write a whole paragraph. In particular, some notable intermediate and advanced concepts such as subqueries and windowing are missing or not covered in detail, although they have been tested in numerous technical interviews and are essential to my current role as an analyst. These skills include

  • Treatment of NULL with COALESCE
  • Subqueries and their impact on query efficiency
  • temporary tables
  • The car is a
  • Window functions such as PARTITION, LEAD, LAG
  • Custom Functions
  • Using indexes in queries to speed up operations.

To learn these skills, I focused primarily on usageSQLZoo.net,This is free and offers very challenging exercises for each concept. My favorite feature of SQLZoo is that it includes exercises that test different concepts in a built-in question. For example, you receive the following entity-relationship diagram and are asked to build complex queries based on it.

How to learn Data Science 2020 yourself (4)
(Video) Step by step roadmap to learn data science in 6 months | Complete data science roadmap

This is similar to what we find at work as an analyst: we use different techniques that we learn to extract information from the same database. The following is the entity relationship diagram from SQLZoo's questioncounseling center'. Accordingly, you are askedView the manager and number of incoming calls for each hour of the day on 08/12/2017. (Try it yourselfHere!)

Other resources I've used includeSQL Questions by Zachary ThomasjLeetcode.

To start learning the programming and tools you need for data science, you can't run away from R and/or Python. They are very popular programming languages ​​used for data manipulation, visualization and wrangling. the question orR vs Pythonis an old question that deserves its own post. It's my turn?

It doesn't matter if you choose R or Python: once you master one, you can easily choose the other.

My journey with programming in Python and R started with coding websites like CodeAcademy, Datacamp, Dataquest, SoloLearn and Udemy. These websites offer customized courses organized by language or packages. Each breaks down the concepts into digestible chunks and gives the user the starting code to fill in the blanks. These pages usually walk you through a simple demonstration, and then you have the opportunity to practice the concept through exercises right after. Some then offer project-based exercises.

Today I'm going to focus on two of my favorites, Datacamp and Dataquest.

Please note that below you will find an affiliate link for the courses. This means nothing to you as the price is the same, but I do get a small commission if you decide to make a purchase.

data field

DataCamp offers video courses taught by experts in the field and exercises to fill in the blanks. Video conferences are mostly concise and efficient.

How to learn Data Science 2020 yourself (5)

One part I love about DataCamp is the up-to-date courses organized by career path in SQL, R and Python.This makes planning your study plan easier – now all you have to do is follow your interest path.some of the streetscontain:

  • Data Science in Python/R
  • Data Analyst in Python/R/SQL
  • R-Statistics
  • Machine Learning Scientists in Python/R
  • Python/R programmer

Personally, I started my R training withData Science in R, which provided a very detailed introduction to sorting in R, a collection of incredibly useful data packages for organizing, manipulating, and visualizing data, specifically including ggplot2 (for data visualization), dplyr (for data manipulation), and stringr (for string manipulation).

(Video) Learn Data Science Tutorial - Full Course for Beginners

How to learn Data Science 2020 yourself (6)

However, I have my complaints about DataCamp - it's poor retention of information after DataCamp is complete. With the format of filling in the gaps, it's easy to guess what's needed in the gap without really understanding the concept. As a student on the platform, I tried to take as many courses as possible in the shortest possible time. I skimmed the code and filled in the blanks without understanding the big picture. If I could restart my DataCamp learning all over again, it would take my time to better digest and understand the code as a whole, not just the parts I was supposed to complete.

data search

How to learn Data Science 2020 yourself (7)

Dataquest is very similar to DataCamp. It focuses on using code exercises to illuminate programming concepts. Like Datacamp, it offers a wide range of courses in R, Python, and SQL, albeit a little less extensive than DataCamp's. However, unlike Datacamp, Dataquest does not offer video conferencing, for example.

Some of the leads Dataquest offers include:

  • Data Analyst in R/Python
  • Data Science in Python
  • data technology

DataQuest content is generally more difficult than DataCamp content. There was also less formatting practice to fill in the gaps. Although it took longer, my retention of knowledge was better on DataQuest.

Another great feature of DataQuest is the monthly mentor call, which reviews your resume and provides technical guidance. Although I didn't personally approach a mentor, I would have done so in hindsight as it would definitely help me progress much faster.

Data visualization is key to showcasing the insights you've gleaned from your data. After learning the technical skills of graphing using Python and R, I learned the principles of data visualization from a book, Storytelling with Data by Cole Knaflic.

How to learn Data Science 2020 yourself (8)

This book is platform independent. In other words, it doesn't focus on any specific software, but instead teaches the general principles of data visualization with insightful examples. Some of the key points you can learn from this book are:

(Video) How I Would Learn Data Science (If I Had to Start Over)

  • understand the context
  • Choose an effective image
  • remove the clutter
  • Grab attention wherever you want
  • Think like a designer
  • To tell a story

I thought I knew something about data visualization until I read this book.

After digesting the book, I was able to create a (somewhat) visually appealing graphic dealing with police brutality against black people. One of the most important lessons learned from the book applied here wasdraw attention to yourself wherever you want.To do this, the Afro-American line was highlighted with a bright yellow, reminiscent of the BLM color, and made sure that the rest of the graphic faded into the background with softer tones such as white and gray.

How to learn Data Science 2020 yourself (9)

In this post, I've covered the steps I took to learn to code from scratch. With these courses, you already have the skills to manipulate data! However, there is still a long way to go. I will report on this in future posts

  • Part 2 -Math, Probability and Statistics
  • Part 3: Basics of computer science
  • Part 4-machine learning
  • Part 5 —Create your first machine learning project

If you have any questions, feel free to contact me on LinkedIn. all the best and good luck!

If you enjoyed this blog post, feel free to read my other articles on machine learning:

  • How to Become a Data Analyst: Data Visualization with Google Data Studio
  • What makes a great wine...great? (Using Machine Learning and Partial Dependence Plot in Finding Good Wine)
  • Interpreting black box ML models with LIME(Understanding LIME visually by modeling breast cancer data)

[1] Dhar, V. (2013)."Data Science and Forecasting".ACM Communication.56(12): 64–73.doi:10.1145/2500499.S2CID 6107147.filedExtracted from the original on November 9, 2014. Accessed September 2, 2015.

FAQs

What is the easiest way to learn data science? ›

The best way to learn data science is to work on projects so you can gain data science skills that can be applied immediately and are useful from a real-world implementation perspective. The sooner you start working on diverse data science projects, the faster you will learn the related concepts.

How to self learn data science in 2022? ›

How to Self Learn Data Science in 2022
  1. Why Project Based Approach?
  2. Skillset - Business Knowledge.
  3. Skillset - Statistics (Experiment Design)
  4. Skillset - SQL.
  5. Skillset - Python (Pandas)
  6. Skillset - Statistics (Descriptive Statistics)
  7. Skillset - Data Visualization.
  8. Skillset - Machine Learning.
Apr 17, 2022

Is 6 months enough to learn data science? ›

Becoming a data scientist in six months is possible if you have a strong background in mathematics and coding. If you are one such candidate, follow the steps below: Download simple datasets and perform Exploratory Data Analysis on them.

Is 3 months enough for data science? ›

In conclusion, I would say that it is hard to become a Data Scientist, especially in three months. This is because: Some Bootcamp is not qualified enough to teach you the necessary data science skills. Not every student are talented enough to catch up with the learning material in a short time.

How many hours a day do you need to study data science? ›

While undergraduate and master's courses in colleges and universities often taken 2-3 years to teach you all the above, many say you can learn them in about 6 months by dedicating around 6-7 hours every day.

Can I learn data science at 40? ›

So despite industry ageism, a recent study by Zippia showed that the average age of data analysts in the U.S. is 43 years old. This takes us back to our titular question: are you too old to start a new career in data analytics? The short answer, in our opinion, is no.

How do I start learning data science from scratch? ›

How to Learn Data Science From Scratch for Free?
  1. Learn Programming Language.
  2. Step 2- Learn Math & Statistics.
  3. Step 3- Learn Data Science Libraries.
  4. Step 4- Learn SQL Skills.
  5. Step 5- Learn Data Visualization.
  6. Step 6- Learn Machine Learning Algorithms.
  7. Step 7- Take Part in Data Science Competitions.

How many days it will take to learn data science? ›

On average, to a person with no prior coding experience and/or mathematical background, it takes from 7 to 12 months of intensive studies to become an entry-level data scientist. It is important to keep in mind that learning only the theoretical basis of data science may not make you a real data scientist.

Can a non IT guy learn data science? ›

Data Science is only for persons with an IT background. It is a persistent myth that many people believe. Although it is true that some IT professionals seek to advance their skills in analytics, this field is not only open to people with a background in programming and IT.

Can I learn data science on my own for free? ›

An online learning platform, freeCodeCamp is another best place to learn Data Science for free. They offer free lessons on statistics for Data Science, computer science concepts, Python fundamentals, Pandas, Python Matplotlib, and even a guide to build a good Data Science portfolio.

Can an average person learn data science? ›

Many students at all levels want to take part in data science. Thanks to communication tools, there are lots of ways to learn data science. You can attend online courses from your home and become a data scientist. Compared to university expenses, it is very cheap to have a profession with these courses.

Is data science hard for beginners? ›

Data science is a difficult field. There are many reasons for this, but the most important one is that it requires a broad set of skills and knowledge. The core elements of data science are math, statistics, and computer science. The math side includes linear algebra, probability theory, and statistics theory.

Videos

1. Human Machine Communication | Preksha Kaparwan | TEDxMussoorie
(TEDx Talks)
2. FASTEST Way to Learn Data Science and ACTUALLY Get a Job
(Power Couple)
3. Strategies for Learning Data Science in 2020 (Data Science 101)
(Data Professor)
4. How To Learn Data Science by Self Study and For Free
(Krish Naik)
5. How to Become a SELF-TAUGHT Data Scientist
(DecisionForest)
6. How I Would Learn Data Science in 2023? (If I could start over)
(Sundas Khalid)
Top Articles
Latest Posts
Article information

Author: Ms. Lucile Johns

Last Updated: 23/05/2023

Views: 6326

Rating: 4 / 5 (61 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Ms. Lucile Johns

Birthday: 1999-11-16

Address: Suite 237 56046 Walsh Coves, West Enid, VT 46557

Phone: +59115435987187

Job: Education Supervisor

Hobby: Genealogy, Stone skipping, Skydiving, Nordic skating, Couponing, Coloring, Gardening

Introduction: My name is Ms. Lucile Johns, I am a successful, friendly, friendly, homely, adventurous, handsome, delightful person who loves writing and wants to share my knowledge and understanding with you.