scipeds.data.engine
IPEDSQueryEngine(db_path: Optional[Path] = SCIPEDS_CACHE_DIR / DB_NAME)
A structured way to query the IPEDS table to format data for visualization
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db_path
|
Optional[Path]
|
Path to pre-processed database file. Defaults to CACHE_DIR / DB_NAME. |
SCIPEDS_CACHE_DIR / DB_NAME
|
Raises:
Type | Description |
---|---|
FileNotFoundError
|
Pre-processed database file not found. |
get_df_from_query(query: str, query_params: Optional[Dict[str, Any]] = None, show_query: bool = False) -> pd.DataFrame
Return the dataframe result of the provided SQL query on the pre-processed duckdb
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query
|
str
|
SQL query (using duckdb syntax) |
required |
query_params
|
Dict[str, Any]
|
Prepared statement variables for query. Defaults to None. |
None
|
show_query
|
bool
|
Whether to print the query and parameters before executing. Defaults to False |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Data returned by query |
list_tables() -> List[str]
List all tables in the duckdb
Returns:
Type | Description |
---|---|
List[str]
|
List[str]: A list of all available tables |
get_cip_table() -> pd.DataFrame
Get a table of every unique 2020 CIP Code
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Data frame of CIP codes and corresponding taxonomy titles |
get_institutions_table(cols: str | list[str] | None = None) -> pd.DataFrame
Get institution characteristics table, optionally with specified columns
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Data frame of institution characteristics |