Improve type handling in read_sql and read_sql_table

### Problem

In _pd.read_sql_ and _pd.read_sql_table_ when the _chunksize_ parameter is set, Pandas builds a DataFrame with dtypes inferred from the data **in the chunk**. This can be a problem if an INTEGER colum contains null values in some chunks but not in others, leading the same column to be int64 in some cases and in others float64. A similar problem happens with strings. 

In ETL processes or simply when dumping large queries to disk in HDF5 format, the user currently has the burden of explicitly having to handle the type conversions of potentially many columns. 
### Solution?

Instead of guessing the type from a subset of the data,  it should be possible to obtain the type information from the database and map it to the appropriate dtypes. 

It is possible to obtain column information from  Sqlalchemy when querying a full table by inspecting its metadata, but I was unsuccessfull in findind a way to do it for a general query. 
Although I am unaware of all the possible type problems that can arise [DBAPI](https://www.python.org/dev/peps/pep-0249/#type-objects-and-constructors) does actually enforce the cursor.description to specify whether each result column is nullable. 
Pandas could use this information (optionally) to always interpret nullable numeric columns as floats and strings as object columns. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Uh oh!

Improve type handling in read_sql and read_sql_table #13049

Problem

Solution?

4 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Improve type handling in read_sql and read_sql_table #13049

Description

Problem

Solution?

Activity

jreback commented on May 2, 2016

jorisvandenbossche commented on May 2, 2016

rsdenijs commented on May 3, 2016

jorisvandenbossche commented on May 3, 2016

rsdenijs commented on May 3, 2016

jorisvandenbossche commented on May 3, 2016

rsdenijs commented on May 5, 2016

chananshgong commented on Dec 21, 2016

konstantinmiller commented on Nov 30, 2017

jorisvandenbossche commented on Nov 30, 2017

4 remaining items

sam-hoffman commented on Apr 29, 2020

jorisvandenbossche commented on Apr 30, 2020

aaronlutz commented on Oct 12, 2020

avinashpancham commented on Dec 30, 2020

silverdevelopper commented on Jul 26, 2022

eirnym commented on Aug 10, 2024

tobwen commented on Sep 22, 2024

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions