5 Tips to spread your message more effectively across the audience with static visualizations

Data Visualization is a must-have skill you need to become a data scientist. Most of the time, we fully focus on learning how to use visualizations tools and we do not stop to think about the designing principles and good practices we should follow when it comes to making visualizations. Visualizations provide key insights at a glance and they communicate an idea much more effectively than raw data. That is mainly because human brains are unable to process large amounts of data at once. However, it is not enough to provide a visualization to the audience, it has to be properly designed. The visualization design highly affects how fast the information is conveyed to the viewers. …


The complete guide to clean data sets — Part 3

Filtering data from a data frame is one of the most common operations when cleaning the data. Pandas provides a wide range of methods for selecting data according to the position and label of the rows and columns. In addition, Pandas also allows you to obtain a subset of data based on column types and to filter rows with boolean indexing.

In this article, we will cover the most common operations for selecting a subset of data from a Pandas data frame: (1) selecting a single column by label, (2) selecting multiple columns by label, (3) selecting columns by data type, (4) selecting a single row by label, (5) selecting multiple rows by label, (6) selecting a single row by position, (7) selecting multiple rows by position, (8) selecting rows and columns simultaneously, (9) selecting a scalar value, and (10) selecting rows using Boolean selection. …


Data Analysis with Pandas, Plotly, and Matplotlib

This article is a journey through the history of the Bundesliga. Analyzing historical data (all classifications from 1963 until 2020), we will be able to answer many questions about the German league. What teams won the German league? What teams nearly won the Bundesliga? When did Bayern’s hegemony start? What teams receive more penalties? … and many more! Continue reading ▶️

Image for post
Image for post
Photo by Mario Klassen on Unsplash

Introduction

Let’s make a brief introduction for those that have never heard about the German league. 🙌

The German football league commonly known as the Bundesliga is the first national football league in Germany, being one of the most popular professional sports leagues across the world. …


Data Analysis 📝 + History 🌐

The Spanish football league commonly known as La Liga is the first national football league in Spain, being one of the most popular professional sports leagues in the world. It was founded in 1929 and has been held every year since then with the exception of the period (1936–1939) due to the Spanish Civil War. At his foundation, it only consisted of 10 teams. Currently, it is made up of 20 teams quite evenly spread across the country but mainly from the most developed regions: Madrid, Barcelona, and Basque Country. The top four teams are qualified for the Champions League while the three lowest placed teams (positions 18–20) are relegated to the second division. …


The complete guide to clean data sets — Part 2

Image for post
Image for post
Photo by Ine Carriquiry on Unsplash

The success of a machine learning algorithm highly depends on the quality of the data fed into the model. Real-world data is often dirty containing outliers, missing values, wrong data types, irrelevant features, or non-standardized data. The presence of any of these will prevent the machine learning model to properly learn. For this reason, transforming raw data into a useful format is an essential stage in the machine learning process.

Outliers are objects in the data set that exhibit some abnormality and deviate significantly from the normal data. In some cases, outliers can provide useful information (e.g. in fraud detection). …


The complete guide to clean datasets — Part 1

Image for post
Image for post
Photo by Jeremy Perkins on Unsplash

The success of a machine learning algorithm highly depends on the quality of the data fed into the model. Real-world data is often dirty containing outliers, missing values, wrong data types, irrelevant features, or non-standardized data. The presence of any of these will prevent the machine learning model to properly learn. For this reason, transforming raw data into a useful format is an essential stage in the machine learning process. One technique you will come across multiple times when pre-processing data is normalization.

Data Normalization is a common practice in machine learning which consists of transforming numeric columns to a common scale. In machine learning, some feature values differ from others multiple times. The features with higher values will dominate the leaning process. However, it does not mean those variables are more important to predict the outcome of the model. Data normalization transforms multiscaled data to the same scale. After normalization, all variables have a similar influence on the model, improving the stability and performance of the learning algorithm. …


Simple, cumulative, and exponential moving averages with Pandas

Image for post
Image for post
Photo by Austin Distel on Unsplash

The moving average is commonly used with time series to smooth random short-term variations and to highlight other components (trend, season, or cycle) present in your data. The moving average is also known as rolling mean and is calculated by averaging data of the time series within k periods of time. Moving averages are widely used in finance to determine trends in the market and in environmental engineering to evaluate standards for environmental quality such as the concentration of pollutants.

In this article, we briefly explain the most popular types of moving averages: (1) the simple moving average (SMA), (2) the cumulative moving average (CMA), and (3) the exponential moving average (EMA). In addition, we show how to implement them with Python. To do so, we use two data sets from Open Data Barcelona, containing rainfall and temperatures of Barcelona from 1786 until 2019. …


Munich is leading e-mobility in Germany 💚

Image for post
Image for post
Photo by Jannes Glas on Unsplash

Electric mobility plays an important role to guarantee a sustainable future around the world. The popularity of electric vehicles is continuously increasing and they represent an appealing alternative to the internal combustion engine. However, to ensure the success of electric mobility, it is required the availability of good infrastructure for charging electric vehicles.

In this article, we analyze the charging stations located across Germany. The data can be found on the following webpage and contains the public charging points available in Germany (and reported to the Bundesnetzagentur!). …


Part 2 — with usage examples

Image for post
Image for post

Functions are a group of statements that allow you to perform a specific task. They are especially useful to avoid repetitions of code and make your program more organized and easier to debug. Although you can always define your own functions, Python provides multiple built-in functions that are always available to use. In this article, we explain 10 of those functions in detail. Let’s get started!

1. dir

The dir([object]) function returns a sorted list of strings containing the attributes and methods of any object in Python.


with usage examples

The Python Standard Library contains a wide range of modules to deal with everyday programming and is included with the standard version of Python, meaning no additional installation is required. It provides modules for such tasks as interacting with the operating system, reading and writing CSV files, generating random numbers, and working with dates and time. This article describes 8 modules of the Python Standard Library that I am sure you will come across when programming in Python. Let’s get started! 🙌

Image for post
Image for post
Photo by Chris Ried on Unsplash

1. Zipfile

The zipfile library provides tools to easily work with zip files. It allows you to create, read, and write zip files directly in Python, without being necessary to use an external program. …

About

Amanda Iglesias Moreno

Industrial Engineer, Software Developer, and Data Scientist. — Stuttgart/Valencia — https://www.linkedin.com/in/amanda-iglesias-moreno-55029417a/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store