Pandas DataFrame.mean: Compute the Mean of DataFrame Values
Pandas DataFrame.mean
The DataFrame.mean
method in pandas calculates the mean (average) of numerical values in a DataFrame along a specified axis. It is useful for summarizing data and performing statistical analyses.
Syntax
The syntax for DataFrame.mean
is:
DataFrame.mean(axis=0, skipna=True, numeric_only=False, **kwargs)
Here, DataFrame
refers to the pandas DataFrame on which the mean operation is applied.
Parameters
Parameter | Description |
---|---|
axis | Specifies the axis to calculate the mean:
|
skipna | A boolean that determines whether to exclude NA/null values:
|
numeric_only | If True , only numeric (int, float, boolean) columns are included in the calculation. Default is False . |
**kwargs | Additional keyword arguments to customize the behavior of the function. |
Returns
Returns a Series with the mean values for the specified axis. If axis=None
(available from version 2.0.0), it returns a scalar mean over the entire DataFrame.
Examples
Compute Column-wise Mean
The default behavior computes the mean for each column.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [10, 20, 30, 40],
'B': [5, 15, 25, 35],
'C': [2, 4, 6, 8]
})
# Compute mean for each column
column_mean = df.mean()
print(column_mean)
Output
A 25.0
B 20.0
C 5.0
dtype: float64
Compute Row-wise Mean
Setting axis=1
computes the mean for each row.
Python Program
# Compute mean for each row
row_mean = df.mean(axis=1)
print(row_mean)
Output
0 5.67
1 13.00
2 20.33
3 27.67
dtype: float64
Handling Missing Values
By default, skipna=True
ignores missing values. Setting skipna=False
includes them, resulting in NaN where applicable.
Python Program
df_with_nan = df.copy()
df_with_nan.loc[2, 'A'] = None # Introduce a NaN value
# Compute column-wise mean, including NaNs
mean_with_nan = df_with_nan.mean(skipna=False)
print(mean_with_nan)
Output
A NaN
B 20.0
C 5.0
dtype: float64
Using numeric_only=True
If a DataFrame contains non-numeric data, setting numeric_only=True
excludes them from the calculation.
Python Program
df_mixed = pd.DataFrame({
'A': [10, 20, 30, 40],
'B': [5, 15, 25, 35],
'C': ['X', 'Y', 'Z', 'W']
})
# Compute mean only for numeric columns
numeric_mean = df_mixed.mean(numeric_only=True)
print(numeric_mean)
Output
A 25.0
B 20.0
dtype: float64
Summary
In this tutorial, we explored the DataFrame.mean
method in pandas. Key takeaways include:
mean()
calculates the average for numerical columns by default.- Setting
axis=1
computes row-wise means. - Missing values are ignored by default (
skipna=True
), but can be included. numeric_only=True
ensures only numeric columns are included.- Setting
axis=None
(available from pandas 2.0.0) computes the overall mean as a scalar.