Skip to content

scipeds.data.completions

CompletionsQueryEngine(db_path: Optional[Path] = SCIPEDS_CACHE_DIR / DB_NAME)

Bases: IPEDSQueryEngine

A structured way to query the IPEDS table to format data for visualization

Parameters:

Name Type Description Default
db_path Optional[Path]

Path to pre-processed database file. Defaults to CACHE_DIR / DB_NAME.

SCIPEDS_CACHE_DIR / DB_NAME

Raises:

Type Description
FileNotFoundError

Pre-processed database file not found.

get_df_from_query(query: str, query_params: Optional[Dict[str, Any]] = None, show_query: bool = False) -> pd.DataFrame

Return the dataframe result of the provided SQL query on the pre-processed duckdb

Parameters:

Name Type Description Default
query str

SQL query (using duckdb syntax)

required
query_params Dict[str, Any]

Prepared statement variables for query. Defaults to None.

None
show_query bool

Whether to print the query and parameters before executing. Defaults to False

False

Returns:

Type Description
DataFrame

pd.DataFrame: Data returned by query

list_tables() -> List[str]

List all tables in the duckdb

Returns:

Type Description
List[str]

List[str]: A list of all available tables

get_cip_table() -> pd.DataFrame

Get a table of every unique 2020 CIP Code

Returns:

Type Description
DataFrame

pd.DataFrame: Data frame of CIP codes and corresponding taxonomy titles

get_institutions_table(cols: str | list[str] | None = None) -> pd.DataFrame

Get institution characteristics table, optionally with specified columns

Returns:

Type Description
DataFrame

pd.DataFrame: Data frame of institution characteristics

rollup_by_grouping(grouping: Grouping, rollup: TaxonomyRollup, query_filters: QueryFilters, by_year: bool = False, rel_rate: bool = False, show_query: bool = False, filter_unitids: list[int] | None = None) -> pd.DataFrame

Aggregate completions (subject to filters) for fields within the given roll-up, aggregating by selected grouping and subject to the applied filters

Parameters:

Name Type Description Default
grouping Grouping

How to group the data

required
rollup TaxonomyRollup

Fields in taxonomy to include in aggregation

required
query_filters QueryFilters

Filters to apply prior to aggregation

required
by_year bool

Whether to group by year (True) or aggregate over all years (False). Default: False

False
rel_rate bool

Whether to calculate relative representation. If true, also adds associated variables. Default: False

False
show_query bool

Whether to print the query and parameters before executing. Default: False

False
filter_unitids (list[int], Optional)

List of unitids to filter on. Default: None

None

Returns:

Type Description
DataFrame

pd.DataFrame: Completions within fields in the roll-up, aggregated by chosen grouping and subject to filters

field_totals_by_grouping(grouping: Grouping, taxonomy: FieldTaxonomy, query_filters: QueryFilters, taxonomy_values: list[str] | None = None, by_year: bool = False, rel_rate: bool = False, show_query: bool = False, filter_unitids: list[int] | None = None) -> pd.DataFrame

Compute aggregate counts for all fields in a given taxonomy

Parameters:

Name Type Description Default
grouping Grouping

How to group the data

required
taxonomy FieldTaxonomy

Taxonomy to aggregate over

required
query_filters QueryFilters

Pre-aggregation filters to apply to raw data

required
taxonomy_values list[str]

Optional list of field values to filter on. Default: None

None
by_year bool

Whether to group by year (True) or aggregate over all years (False). Default: False

False
rel_rate bool

Whether to calculate relative representation. Default: False

False
show_query bool

Whether to print the query and parameters before executing. Default: False

False
filter_unitids (list[int], Optional)

List of unitids to filter on. Default: None

None

Returns:

Type Description
DataFrame

pd.DataFrame: Relative rates by grouping for each field in taxonomy

uni_rollup_by_grouping(grouping: Grouping, rollup: TaxonomyRollup, query_filters: QueryFilters, by_year: bool = False, rel_rate: bool = False, effect_size: bool = False, show_query: bool = False, filter_unitids: list[int] | None = None) -> pd.DataFrame

Get intersectional degree counts and rates within intersectional subgroups"

Parameters:

Name Type Description Default
grouping Grouping

How to group the data

required
rollup TaxonomyRollup

Taxonomy to aggregate over

required
query_filters QueryFilters

Pre-aggregation filters

required
by_year bool

Whether to group by year (True) or aggregate over all years (False). Default: False

False
rel_rate bool

Whether to calculate relative representation. If true, also adds associated variables. Default: False

False
effect_size bool

Whether to compute effect size. Default: False

False
show_query bool

Whether to print the query and parameters before executing. Default: False

False
filter_unitids (list[int], Optional)

List of unitids to filter on. Default: None

None

Returns:

Type Description
DataFrame

pd.DataFrame: Completions in fields contained within roll-up, aggregated by university UNITID and chosen grouping, subject to filters

uni_field_totals_by_grouping(grouping: Grouping, taxonomy: FieldTaxonomy, query_filters: QueryFilters, taxonomy_values: list[str] | None = None, by_year: bool = False, rel_rate: bool = False, effect_size: bool = False, show_query: bool = False, filter_unitids: list[int] | None = None) -> pd.DataFrame

Aggregate completions (subject to filters) for all fields within a given taxonomy at each university

Parameters:

Name Type Description Default
grouping Grouping

How to group the data

required
taxonomy FieldTaxonomy

Taxonomy to aggregate over

required
query_filters QueryFilters

Pre-aggregation filters to apply to raw data

required
taxonomy_values list[str]

Optional list of field values to filter on. Default: None

None
by_year bool

Whether to group by year (True) or aggregate over all years (False). Default: False

False
rel_rate bool

Whether to calculate relative representation. If true, also adds associated variables. Default: False

False
effect_size bool

Whether to compute effect size. Default: False

False
show_query bool

Whether to print the query and parameters before executing. Default: False

False
filter_unitids (list[int], Optional)

List of unitids to filter on. Default: None

None

Returns:

Type Description
DataFrame

pd.DataFrame: Completions in each field in the taxonomy, aggregated by university UNITID and chosen grouping, subject to filters