`scipeds.data.completions`

`CompletionsQueryEngine(db_path: Optional[Path] = SCIPEDS_CACHE_DIR / DB_NAME)`

Bases: IPEDSQueryEngine

A structured way to query the IPEDS table to format data for visualization

Parameters:

Name	Type	Description	Default
`db_path`	`Optional[Path]`	Path to pre-processed database file. Defaults to CACHE_DIR / DB_NAME.	`SCIPEDS_CACHE_DIR / DB_NAME`

Raises:

Type	Description
`FileNotFoundError`	Pre-processed database file not found.

`get_df_from_query(query: str, query_params: Optional[Dict[str, Any]] = None, show_query: bool = False) -> pd.DataFrame`

Return the dataframe result of the provided SQL query on the pre-processed duckdb

Parameters:

Name	Type	Description	Default
`query`	`str`	SQL query (using duckdb syntax)	required
`query_params`	`Dict[str, Any]`	Prepared statement variables for query. Defaults to None.	`None`
`show_query`	`bool`	Whether to print the query and parameters before executing. Defaults to False	`False`

Returns:

Type	Description
`DataFrame`	pd.DataFrame: Data returned by query

`list_tables() -> List[str]`

List all tables in the duckdb

Returns:

Type	Description
`List[str]`	List[str]: A list of all available tables

`get_cip_table() -> pd.DataFrame`

Get a table of every unique 2020 CIP Code

Returns:

Type	Description
`DataFrame`	pd.DataFrame: Data frame of CIP codes and corresponding taxonomy titles

`get_institutions_table(cols: str | list[str] | None = None) -> pd.DataFrame`

Get institution characteristics table, optionally with specified columns

Returns:

Type	Description
`DataFrame`	pd.DataFrame: Data frame of institution characteristics

`rollup_by_grouping(grouping: Grouping, rollup: TaxonomyRollup, query_filters: QueryFilters, by_year: bool = False, rel_rate: bool = False, show_query: bool = False, filter_unitids: list[int] | None = None) -> pd.DataFrame`

Aggregate completions (subject to filters) for fields within the given roll-up, aggregating by selected grouping and subject to the applied filters

Parameters:

Name	Type	Description	Default
`grouping`	`Grouping`	How to group the data	required
`rollup`	`TaxonomyRollup`	Fields in taxonomy to include in aggregation	required
`query_filters`	`QueryFilters`	Filters to apply prior to aggregation	required
`by_year`	`bool`	Whether to group by year (True) or aggregate over all years (False). Default: False	`False`
`rel_rate`	`bool`	Whether to calculate relative representation. If true, also adds associated variables. Default: False	`False`
`show_query`	`bool`	Whether to print the query and parameters before executing. Default: False	`False`
`filter_unitids`	`(list[int], Optional)`	List of unitids to filter on. Default: None	`None`

Returns:

Type	Description
`DataFrame`	pd.DataFrame: Completions within fields in the roll-up, aggregated by chosen grouping and subject to filters

`field_totals_by_grouping(grouping: Grouping, taxonomy: FieldTaxonomy, query_filters: QueryFilters, taxonomy_values: list[str] | None = None, by_year: bool = False, rel_rate: bool = False, show_query: bool = False, filter_unitids: list[int] | None = None) -> pd.DataFrame`

Compute aggregate counts for all fields in a given taxonomy

Parameters:

Name	Type	Description	Default
`grouping`	`Grouping`	How to group the data	required
`taxonomy`	`FieldTaxonomy`	Taxonomy to aggregate over	required
`query_filters`	`QueryFilters`	Pre-aggregation filters to apply to raw data	required
`taxonomy_values`	`list[str]`	Optional list of field values to filter on. Default: None	`None`
`by_year`	`bool`	Whether to group by year (True) or aggregate over all years (False). Default: False	`False`
`rel_rate`	`bool`	Whether to calculate relative representation. Default: False	`False`
`show_query`	`bool`	Whether to print the query and parameters before executing. Default: False	`False`
`filter_unitids`	`(list[int], Optional)`	List of unitids to filter on. Default: None	`None`

Returns:

Type	Description
`DataFrame`	pd.DataFrame: Relative rates by grouping for each field in taxonomy

`uni_rollup_by_grouping(grouping: Grouping, rollup: TaxonomyRollup, query_filters: QueryFilters, by_year: bool = False, rel_rate: bool = False, effect_size: bool = False, show_query: bool = False, filter_unitids: list[int] | None = None) -> pd.DataFrame`

Get intersectional degree counts and rates within intersectional subgroups"

Parameters:

Name	Type	Description	Default
`grouping`	`Grouping`	How to group the data	required
`rollup`	`TaxonomyRollup`	Taxonomy to aggregate over	required
`query_filters`	`QueryFilters`	Pre-aggregation filters	required
`by_year`	`bool`	Whether to group by year (True) or aggregate over all years (False). Default: False	`False`
`rel_rate`	`bool`	Whether to calculate relative representation. If true, also adds associated variables. Default: False	`False`
`effect_size`	`bool`	Whether to compute effect size. Default: False	`False`
`show_query`	`bool`	Whether to print the query and parameters before executing. Default: False	`False`
`filter_unitids`	`(list[int], Optional)`	List of unitids to filter on. Default: None	`None`

Returns:

Type	Description
`DataFrame`	pd.DataFrame: Completions in fields contained within roll-up, aggregated by university UNITID and chosen grouping, subject to filters

`uni_field_totals_by_grouping(grouping: Grouping, taxonomy: FieldTaxonomy, query_filters: QueryFilters, taxonomy_values: list[str] | None = None, by_year: bool = False, rel_rate: bool = False, effect_size: bool = False, show_query: bool = False, filter_unitids: list[int] | None = None) -> pd.DataFrame`

Aggregate completions (subject to filters) for all fields within a given taxonomy at each university

Parameters:

Name	Type	Description	Default
`grouping`	`Grouping`	How to group the data	required
`taxonomy`	`FieldTaxonomy`	Taxonomy to aggregate over	required
`query_filters`	`QueryFilters`	Pre-aggregation filters to apply to raw data	required
`taxonomy_values`	`list[str]`	Optional list of field values to filter on. Default: None	`None`
`by_year`	`bool`	Whether to group by year (True) or aggregate over all years (False). Default: False	`False`
`rel_rate`	`bool`	Whether to calculate relative representation. If true, also adds associated variables. Default: False	`False`
`effect_size`	`bool`	Whether to compute effect size. Default: False	`False`
`show_query`	`bool`	Whether to print the query and parameters before executing. Default: False	`False`
`filter_unitids`	`(list[int], Optional)`	List of unitids to filter on. Default: None	`None`

Returns:

Type	Description
`DataFrame`	pd.DataFrame: Completions in each field in the taxonomy, aggregated by university UNITID and chosen grouping, subject to filters