Skip to content

read_sql from SQL Server to Arrow truncates datetime #229

Closed
@t-alex-fritz

Description

@t-alex-fritz

I'm working with a MS SQL Server 2019. When reading a datetime field, read_sql correctly captures it in a pandas dataframe. When I convert that dataframe to an arrow table, datetimes are also retained correctly. However, loading directly to an arrow table truncates the datetime fields to midnight. I'd like to remove the pandas dependency and load directly to an arrow table. Is there a way to do this without truncating the datetime?

Example Code:

import connectorx as cx
import pyarrow as pa
con_string = 'mssql://user:[email protected]%5CG:1439/database'
print('------- Pandas Table -------')
query = 'SELECT top 5 Datum FROM Termine WHERE Datum>getdate()'
pandas_table = cx.read_sql(con_string, query, return_type='pandas')
print(pandas_table)
print('------- Arrow table from pandas -------')
arrow_table_from_pandas = pa.Table.from_pandas(pandas_table)
print(arrow_table_from_pandas)
print('------- Arrow Table -------')
arrow_table = cx.read_sql(con_string, query, return_type='arrow')
print(arrow_table)

Example Output:

------- Pandas Table -------
                Datum
0 2022-02-06 07:30:00
1 2022-02-07 00:00:00
2 2022-02-07 00:00:00
3 2022-02-07 07:00:00
4 2022-02-07 07:30:00
------- Arrow table from pandas -------
pyarrow.Table
Datum: timestamp[ns]
----
Datum: [[2022-02-06 07:30:00.000000000,2022-02-07 00:00:00.000000000,2022-02-07 00:00:00.000000000,2022-02-07 07:00:00.000000000,2022-02-07 07:30:00.000000000]]
------- Arrow Table -------
pyarrow.Table
Datum: date64[ms]
----
Datum: [[2022-02-06,2022-02-07,2022-02-07,2022-02-07,2022-02-07]]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions