DataFrames are used very commonly when analyzing data, so it is a useful concept to understand! A “DataFrame is a 2-dimensional data structure with columns of potentially different types” (Pandas - Intro to data structures).
General template for creating a DataFrame
:
First we need to import pandas.
import pandas as pd
Then, we can construct the dataframe itself.
data = {'Column 1 Name': ['Value 1','Value 2',...],
'Column 2 Name': ['Value 1', 'Value 2', ...],
...
}
df = pd.DataFrame(data)
Simple example with output:
data = {'Name': ['Luke','Leia'], 'Age': [19, 19]}
df = pd.DataFrame(data)
df
Name Age
0 Luke 19
1 Leia 19
The first column, with the numbers starting at $0$ represent the index. If a specific index is wanted, it can be assigned with the DataFrame
is created.
Assuming the same dataframe from above, we can obtain the names from the dataframe with the following:
df.Name
0 Luke
1 Leia
Using a logical mask, we can obtain the row of the dataframe where the name is Luke. This could be done for any of the values or columns in the dataframe if you are looking for something specific.
df[df.Name=='Luke']
Name Age
0 Luke 19
We want to add a gender column, which we tell it what values to add for Luke and Leia.
df['Gender'] = ['Male','Female']
df
Name Age Gender
0 Luke 19 Male
1 Leia 19 Female
To get some basic information regarding the values in the columns of the dataframe, we can use describe()
.
df.describe()
Age
count 2.0
mean 19.0
std 0.0
min 19.0
25% 19.0
50% 19.0
75% 19.0
max 19.0
With additional data added to the dataframe, this analysis will become more interesting and valuable!
There are a number of attributes and methods to be utilized with DataFrames. Resources for those can be found here. Additionally, some of these attributes and methods will be shown through the examples provided.