Duckdb connection
get_db_connection(read_only=True)
Establishes a connection to the local DuckDB database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
read_only
|
bool
|
If True, opens the database in read-only mode to prevent
accidental writes during analysis. Defaults to |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
DuckDBPyConnection |
DuckDBPyConnection
|
The active database connection object. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the database file ( |
Source code in api/src/db/duckdb_connection.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | |
get_schema_info()
Generates a rich, LLM-friendly textual representation of the database schema.
This function combines static metadata descriptions with dynamic data profiling to help the AI Agent understand the dataset's structure and content.
Process:
- Reflect Schema: Queries DuckDB to get column names and types.
- Match Metadata: Aligns columns with the
COLUMN_METADATAdictionary. - Data Profiling (Dynamic): For categorical columns (VARCHAR), it executes
a
GROUP BYquery to fetch the top 5 most frequent values. This allows the Agent to see actual examples (e.g., seeing 'Covid-19' vs 'SARS-CoV-2'). - Formatting: Compiles everything into a Markdown list string.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
A formatted string describing columns, types, descriptions, and sample values. Returns an error message string if the schema cannot be read. |
Source code in api/src/db/duckdb_connection.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |