scipeds.data.completions
CompletionsQueryEngine(db_path: Optional[Path] = SCIPEDS_CACHE_DIR / DB_NAME)
Bases: IPEDSQueryEngine
A structured way to query the IPEDS table to format data for visualization
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db_path
|
Optional[Path]
|
Path to pre-processed database file. Defaults to CACHE_DIR / DB_NAME. |
SCIPEDS_CACHE_DIR / DB_NAME
|
Raises:
Type | Description |
---|---|
FileNotFoundError
|
Pre-processed database file not found. |
get_df_from_query(query: str, query_params: Optional[Dict[str, Any]] = None, show_query: bool = False) -> pd.DataFrame
Return the dataframe result of the provided SQL query on the pre-processed duckdb
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query
|
str
|
SQL query (using duckdb syntax) |
required |
query_params
|
Dict[str, Any]
|
Prepared statement variables for query. Defaults to None. |
None
|
show_query
|
bool
|
Whether to print the query and parameters before executing. Defaults to False |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Data returned by query |
list_tables() -> List[str]
List all tables in the duckdb
Returns:
Type | Description |
---|---|
List[str]
|
List[str]: A list of all available tables |
get_cip_table() -> pd.DataFrame
Get a table of every unique 2020 CIP Code
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Data frame of CIP codes and corresponding taxonomy titles |
get_institutions_table(cols: str | list[str] | None = None) -> pd.DataFrame
Get institution characteristics table, optionally with specified columns
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Data frame of institution characteristics |
rollup_by_grouping(grouping: Grouping, rollup: TaxonomyRollup, query_filters: QueryFilters, by_year: bool = False, rel_rate: bool = False, show_query: bool = False, filter_unitids: list[int] | None = None) -> pd.DataFrame
Aggregate completions (subject to filters) for fields within the given roll-up, aggregating by selected grouping and subject to the applied filters
Parameters:
Name | Type | Description | Default |
---|---|---|---|
grouping
|
Grouping
|
How to group the data |
required |
rollup
|
TaxonomyRollup
|
Fields in taxonomy to include in aggregation |
required |
query_filters
|
QueryFilters
|
Filters to apply prior to aggregation |
required |
by_year
|
bool
|
Whether to group by year (True) or aggregate over all years (False). Default: False |
False
|
rel_rate
|
bool
|
Whether to calculate relative representation. If true, also adds associated variables. Default: False |
False
|
show_query
|
bool
|
Whether to print the query and parameters before executing. Default: False |
False
|
filter_unitids
|
(list[int], Optional)
|
List of unitids to filter on. Default: None |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Completions within fields in the roll-up, aggregated by chosen grouping and subject to filters |
field_totals_by_grouping(grouping: Grouping, taxonomy: FieldTaxonomy, query_filters: QueryFilters, taxonomy_values: list[str] | None = None, by_year: bool = False, rel_rate: bool = False, show_query: bool = False, filter_unitids: list[int] | None = None) -> pd.DataFrame
Compute aggregate counts for all fields in a given taxonomy
Parameters:
Name | Type | Description | Default |
---|---|---|---|
grouping
|
Grouping
|
How to group the data |
required |
taxonomy
|
FieldTaxonomy
|
Taxonomy to aggregate over |
required |
query_filters
|
QueryFilters
|
Pre-aggregation filters to apply to raw data |
required |
taxonomy_values
|
list[str]
|
Optional list of field values to filter on. Default: None |
None
|
by_year
|
bool
|
Whether to group by year (True) or aggregate over all years (False). Default: False |
False
|
rel_rate
|
bool
|
Whether to calculate relative representation. Default: False |
False
|
show_query
|
bool
|
Whether to print the query and parameters before executing. Default: False |
False
|
filter_unitids
|
(list[int], Optional)
|
List of unitids to filter on. Default: None |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Relative rates by grouping for each field in taxonomy |
uni_rollup_by_grouping(grouping: Grouping, rollup: TaxonomyRollup, query_filters: QueryFilters, by_year: bool = False, rel_rate: bool = False, effect_size: bool = False, show_query: bool = False, filter_unitids: list[int] | None = None) -> pd.DataFrame
Get intersectional degree counts and rates within intersectional subgroups"
Parameters:
Name | Type | Description | Default |
---|---|---|---|
grouping
|
Grouping
|
How to group the data |
required |
rollup
|
TaxonomyRollup
|
Taxonomy to aggregate over |
required |
query_filters
|
QueryFilters
|
Pre-aggregation filters |
required |
by_year
|
bool
|
Whether to group by year (True) or aggregate over all years (False). Default: False |
False
|
rel_rate
|
bool
|
Whether to calculate relative representation. If true, also adds associated variables. Default: False |
False
|
effect_size
|
bool
|
Whether to compute effect size. Default: False |
False
|
show_query
|
bool
|
Whether to print the query and parameters before executing. Default: False |
False
|
filter_unitids
|
(list[int], Optional)
|
List of unitids to filter on. Default: None |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Completions in fields contained within roll-up, aggregated by university UNITID and chosen grouping, subject to filters |
uni_field_totals_by_grouping(grouping: Grouping, taxonomy: FieldTaxonomy, query_filters: QueryFilters, taxonomy_values: list[str] | None = None, by_year: bool = False, rel_rate: bool = False, effect_size: bool = False, show_query: bool = False, filter_unitids: list[int] | None = None) -> pd.DataFrame
Aggregate completions (subject to filters) for all fields within a given taxonomy at each university
Parameters:
Name | Type | Description | Default |
---|---|---|---|
grouping
|
Grouping
|
How to group the data |
required |
taxonomy
|
FieldTaxonomy
|
Taxonomy to aggregate over |
required |
query_filters
|
QueryFilters
|
Pre-aggregation filters to apply to raw data |
required |
taxonomy_values
|
list[str]
|
Optional list of field values to filter on. Default: None |
None
|
by_year
|
bool
|
Whether to group by year (True) or aggregate over all years (False). Default: False |
False
|
rel_rate
|
bool
|
Whether to calculate relative representation. If true, also adds associated variables. Default: False |
False
|
effect_size
|
bool
|
Whether to compute effect size. Default: False |
False
|
show_query
|
bool
|
Whether to print the query and parameters before executing. Default: False |
False
|
filter_unitids
|
(list[int], Optional)
|
List of unitids to filter on. Default: None |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Completions in each field in the taxonomy, aggregated by university UNITID and chosen grouping, subject to filters |