Skip to content

pipeline.cip_crosswalk

The code in pipeline.cip_crosswalk handles transformations related to CIP (Classification of Instructional Program) codes:

CIPCodeCrosswalk(crosswalk_dir: Path = pipeline.settings.RAW_DATA_DIR / pipeline.settings.CROSSWALKS_DIRNAME)

Handles cross-walking of CIP codes from the past to 2020 CIP codes

walk(year_range: Tuple[int, int], codes: pd.Series, titles: Optional[pd.Series] = None) -> Tuple[pd.Series, pd.Series]

Map from old set of codes (in old year range) to newer set of codes

convert_to_cip2020(year: int, codes: Union[str, List[str], pd.Series], titles: Optional[Union[str, List[str], pd.Series]] = None) -> pd.DataFrame

Convert from old CIP codes to CIP 2020 CIP Codes and Titles

NCSESClassifier(filepath: Path = PIPELINE_ASSETS / 'ncses_stem_classification_table.csv')

Class for converting CIP codes to NCSES hierarchical classification

Read the NCSES file into an internal df to use for classification

get_titles(codes: Union[str, Iterable[str]], fill_na: bool = True) -> pd.Series

Return NCSES title strings corresponding to 2020 CIP Codes

Parameters:

Name Type Description Default
codes Union[str, List[str], Series]

CIP 2020 codes

required
fill_na bool

Whether to fill NA values with "Unknown". Default: True

True

Returns:

Type Description
Series

pd.Series: NCSES title strings

classify(original_codes: Union[str, List[str], pd.Series], codes_2020: str | List[str] | pd.Series | None = None) -> pd.DataFrame

Classify CIP code(s) in the NCSES classification.

In all cases, prefer the classification of the CIP2020, but use the original version if the 2020 version is unclassified.

Parameters:

Name Type Description Default
original_codes Union[str, List[str], Series]

CIP code(s)

required
codes_2020 Union[str, List[str], Series]

CIP 2020 code(s) to classify, optional. Default: None

None

Returns:

Type Description
DataFrame

pd.DataFrame: Data frame indexed by CIP code with each level of NCSES classifcation as columns

DHSClassifier(filepath: Path = PIPELINE_ASSETS / 'dhs_stem_classification_table.csv')

Read the NCSES file into an internal df to use for classification

classify(codes: Union[str, List[str], pd.Series, pd.Index]) -> pd.DataFrame

Classify a set of CIP codes as belonging (True) or not belonging (False) to the DHS set of STEM CIP codes

Parameters:

Name Type Description Default
codes Union[str, List[str], Series]

CIP code(s)

required

Returns:

Type Description
DataFrame

pd.DataFrame: DataFrame indexed by input codes with one bool column indicating DHS STEM