cat_mode_impute

cat_mode_impute(data, columns=None, sign='?')

Performs cleaning and mode-based imputation for categorical data.

This function identifies missing values (e.g. “?”), and replaces them with the most frequent (mode) category observed in that column. The imputation is performed independently for each targeted column. If columns is None, the function automatically targets all columns in the DataFrame that contain at least one occurrence of sign. If multiple categories are tied for the mode, the imputed value is chosen deterministically as the lexicographically smallest category.

Parameters

Name	Type	Description	Default
data	pd.DataFrame	The raw input DataFrame (e.g., Adult Census Income data).	required
columns	list of str	The specific columns to clean and impute. If None, all columns that contain the missing value indicator `sign` are targeted.	`None`
sign	str	The specific string used in the dataset to denote missing values.	`"?"`

Returns

Name	Type	Description
	pd.DataFrame	A cleaned DataFrame, where the signs have been replaced by the column mode.

Raises

Name	Type	Description
	TypeError	If `data` is not a pandas DataFrame.
	ValueError	If `sign` is not found in any of the targeted columns, or if a targeted column contains only missing values.
	KeyError	If a column in `columns` is missing from the DataFrame.

Examples

>>> cat_mode_impute(adult_census_df)