26 concepts you should know — Part 1: from A to J

Python is one of the leading programming languages across the globe. It is used in many contexts from data science, robotics, web development to gaming, and rapid prototyping. Its simple syntax makes Python programs really easy to read and write, ensuring a rapid learning curve. Additionally, Python has ‘batteries included’ — multiple libraries (standard and third-party) that will greatly facilitate your work as a programmer. Although a basic programming level can be reached faster than with other programming languages, it definitely takes some time to master Python. …


Building decision trees with Scikit-learn library

AI has tremendous potential in the healthcare context and has been continuously growing in this area over the last few years. The medical industry is using Artificial Intelligence to make smarter and also more accurate decisions. The applications of machine learning in healthcare are wide-ranging from disease diagnosis and identification to robotic surgery, providing in most cases results that are beyond human capabilities.

In this article, you will learn how to build binary classification models to detect patients with spine pathologies using decision trees. The models are constructed using Python and specifically the Scikit-learn library; however, it is important to…


Overview of the impact of COVID-19 and social inequalities on unemployment rates in Barcelona.

The financial crisis of 2008 had a significant impact on the Spanish economy, particularly in terms of unemployment. However, from 2012, Spain was undergoing a slow but continuous economic recovery. The coronavirus pandemic has ended with a growth that seemed unstoppable, increasing again unemployment levels across the whole country.

This article analyzes unemployment rates across all the neighborhoods of Barcelona in the last 8 years. The study aims to respond to multiple aspects related to the temporal and geographical distribution of unemployment rates. How much impact has the corona crisis had in terms of unemployment? Which neighborhoods have been most…


Along with coding examples

Python is currently one of the most widely used programming languages worldwide. Its clean and simple syntax makes Python programs really easy to read and write, ensuring a rapid learning curve. Moreover, Python has ‘batteries included’ — multiple libraries (standard and third-party) that will greatly facilitate your work as a programmer.

Better yet, over the past few years, Python has also become very popular in the data science and machine learning worlds, being with R the most used language in those fast-growing areas ❤️. Although a basic programming level can be reached quite fast, it definitely takes some time to…


According to IMDb.com information

The Academy Awards, also known as the Oscars, are the most prestigious film-making prices and are held annually since 1929 by the Academy of Motion Picture Arts and Sciences, located in Beverly Hills (California, U.S). The ceremony was first broadcast by radio; however, since 1953 is live televised and viewed by millions of spectators worldwide. The Academy is made up of around 6000 members (the complete list is confidential and not disclosed) who are involved in the movie industry either in the past or currently, being all of them responsible to decide who goes home with the statuette.

However, the…


5 Tips to spread your message more effectively across the audience with static visualizations

Data Visualization is a must-have skill you need to become a data scientist. Most of the time, we fully focus on learning how to use visualizations tools and we do not stop to think about the designing principles and good practices we should follow when it comes to making visualizations. Visualizations provide key insights at a glance and they communicate an idea much more effectively than raw data. That is mainly because human brains are unable to process large amounts of data at once. However, it is not enough to provide a visualization to the audience, it has to be…


The complete guide to clean data sets — Part 3

Filtering data from a data frame is one of the most common operations when cleaning the data. Pandas provides a wide range of methods for selecting data according to the position and label of the rows and columns. In addition, Pandas also allows you to obtain a subset of data based on column types and to filter rows with boolean indexing.

In this article, we will cover the most common operations for selecting a subset of data from a Pandas data frame: (1) selecting a single column by label, (2) selecting multiple columns by label, (3) selecting columns by data…


Data Analysis with Pandas, Plotly, and Matplotlib

This article is a journey through the history of the Bundesliga. Analyzing historical data (all classifications from 1963 until 2020), we will be able to answer many questions about the German league. What teams won the German league? What teams nearly won the Bundesliga? When did Bayern’s hegemony start? What teams receive more penalties? … and many more! Continue reading ▶️

Introduction

Let’s make a brief introduction for those that have never heard about the German league. 🙌

The German football league commonly known as the Bundesliga is the first national football league in Germany, being one of the most popular…


Data Analysis 📝 + History 🌐

The Spanish football league commonly known as La Liga is the first national football league in Spain, being one of the most popular professional sports leagues in the world. It was founded in 1929 and has been held every year since then with the exception of the period (1936–1939) due to the Spanish Civil War. At his foundation, it only consisted of 10 teams. Currently, it is made up of 20 teams quite evenly spread across the country but mainly from the most developed regions: Madrid, Barcelona, and Basque Country. The top four teams are qualified for the Champions League…


The complete guide to clean data sets — Part 2

The success of a machine learning algorithm highly depends on the quality of the data fed into the model. Real-world data is often dirty containing outliers, missing values, wrong data types, irrelevant features, or non-standardized data. The presence of any of these will prevent the machine learning model to properly learn. For this reason, transforming raw data into a useful format is an essential stage in the machine learning process.

Outliers are objects in the data set that exhibit some abnormality and deviate significantly from the normal data. In some cases, outliers can provide useful information (e.g. in fraud detection)…

Amanda Iglesias Moreno

Industrial Engineer, Software Developer, and Data Scientist. — Stuttgart/Valencia — https://www.linkedin.com/in/amanda-iglesias-moreno-55029417a/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store