Pandas is a powerful and efficient library for handling "relational" or "labeled" data in Python, allowing you to work with data intuitively and easily. In this article, we will explore the basics of Pandas and how it can be used to analyze and manipulate data.

Why Use Pandas?

1. Support for Multiple File Formats

Pandas supports a wide variety of file formats, including:

  • .csv

  • .json

  • .txt

  • .xlsx

2. Time Series Data Analysis

It is ideal for working with time series data, whether ordered or unordered.

3. Observational and Statistical Data

Pandas is an excellent tool for handling observational and statistical datasets, making manipulation and analysis much easier.

Installing Pandas

To use Pandas, you first need to install it. You can do this using the pip package manager:

pip install pandas

Make sure that Python and pip are installed on your system before running this command.

Core Pandas Objects

Pandas is built around two main objects:

  • DataFrame: A two-dimensional structure that resembles a table with rows and columns. Each column can contain a different data type.

  • Series: A one-dimensional structure similar to an array, which can also be labeled.

Pandas Fundamentals

Pandas combines features from other popular libraries like NumPy, Matplotlib, and SQL, offering a comprehensive and flexible solution for data manipulation. Let's look at some examples to understand how this works in practice.

Importing Pandas

Before getting started, you need to import the library:

import pandas as pd

Creating a DataFrame

You can create a DataFrame from a dictionary of data:

data = {
    "Name": ["Ana", "João", "Carlos"],
    "Age": [23, 35, 30],
    "City": ["São Paulo", "Rio de Janeiro", "Belo Horizonte"]
}

df = pd.DataFrame(data)
print(df)

Output:

     Name  Age             City
0     Ana   23        São Paulo
1    João   35    Rio de Janeiro
2  Carlos   30  Belo Horizonte

Reading Files

To load data from a .csv file, use the following command:

df = pd.read_csv("file.csv")
print(df.head())  # Displays the first 5 rows

Basic Operations

Selecting Columns

print(df["Name"])

Filtering Data

filter = df[df["Age"] > 25]
print(filter)

Adding a New Column

df["State"] = ["SP", "RJ", "MG"]
print(df)

Descriptive Statistics

print(df.describe())

Conclusion

Pandas is an essential tool for anyone working with data in Python. With its ability to efficiently manipulate, analyze, and visualize data, it has become a popular choice among data scientists, analysts, and developers. In future articles, we will explore more advanced Pandas features to take your analyses to the next level!