-
Notifications
You must be signed in to change notification settings - Fork 39
upgrade Python to v3.10 (iss. #1406) #1426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrade Python to v3.10 (iss. #1406) #1426
Conversation
Cleaned up trailing spaces on a few lines.
Resolves vulnerability CVE-2015-20107.
Upgrade `pandas` and related modules to work with Python 3.10.
requirements.txt
Outdated
numpy==1.22.0 | ||
pandas==1.3.1 | ||
pandas==1.4.4 | ||
pangres==2.3.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like bumping numpy
is also appropriate? pangres
has had two major releases, so we may need to change API usages, but I'd look at that as well, since we're updating pandas
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this numpy
version is still compatible. And pangres
is pretty far behind, but seems like it should theoretically still be compatible. Did you test the cron?
Can you rebase this with master so we can more effectively test with the latest version of cron (UDP)? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing with UDW, I'm seeing this warning I wasn't seeing before when using pd.read_sql
:
UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy
Looks like we're passing a Django connection object. I guess it's probably better to create our own SQLAlchemy engine and then close it at the end of the cron?
See these:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is happening on the views, too. We'll need to think about how best to open/close connections. Maybe we can form a MySQL connection string? Alternatively, we can use the Django ORM to pull data in the views, and pass it into DataFrame
constructors, but that's a bigger change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test the cron.
Does the warning appear in the UI or in the server log?
If it doesn't appear in the UI, is the content affected by this warning? Is it getting the wrong information?
If the UI is unaffected, no warning or content problem, what should we do? Do we go ahead with this change on its own, or do we need to address the warning before including it in the release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with a warning filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a problem with the cron.py or were you seeing it in the application log from the views.py when using the django.db.connection
? I guess I could run this and check.
I'm fine with either too, just seems like it's going to complicate this PR trying to implement something here for all the read_sql
areas since we use 2 different databases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing it in the views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be resolved by latest commits using the engine approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also bumped numpy
and pangres
, pangres
only required a small change to respond to a parameter name change
…ing-analytics into 1406-upgrade-python
3dd6da0
to
807ba64
Compare
Derive PostgreSQL connect string for data warehouse DB from Django connections, then create a single engine for all queries that will use that DB.
@jonespm, I implemented both of the quicker solutions discussed. Please let me know which you'd prefer and I'll either update this PR or create a new one: |
Was going to talk to you about this, thanks for picking this up @ssciolla I'm good with #3, looks good. I'm fine with it being new after this is merged or just committed here and squashed. Thanks! We'll probably want to improve the views in the future but we can probably wait to see if we have to rewrite them all anyway to use data marts. |
dashboard/common/db_util.py
Outdated
PORT: int | ||
|
||
|
||
def create_sqlalchemy_engine(db_params: DBParams, type: Literal['mysql', 'postgres']) -> Engine: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the type parameter isn't really needed as ENGINE will always be available in DBParams if you added it to the DBParams class. There would be a value of.
django.db.backends.postgresql
or django.db.bacakends.mysql.
. So seems like you could just drop this parameter and use
if 'mysql' in new_db_params['ENGINE']:
But I guess it isn't a big deal either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the requested change, see d22125c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the type parameter but I'm fine with this either way. Looks good!
…u#1426) * tl-its-umich-edu#1406 - spelling correction & clean-up Cleaned up trailing spaces on a few lines. * tl-its-umich-edu#1406 - upgrade to Python 3.10 Resolves vulnerability CVE-2015-20107. * tl-its-umich-edu#1406 - upgrade `pandas` for Python 3.10 Upgrade `pandas` and related modules to work with Python 3.10. * tl-its-umich-edu#1406 - fix all warehouse connections Derive PostgreSQL connect string for data warehouse DB from Django connections, then create a single engine for all queries that will use that DB. * Create utility function for creating mysql and postgres engines; apply to views' * Remove other database conn prep * Reuse create_sqlalchemy_engine in data_validation * Remove unused variable * Make a couple minor modifications to db_util * Make one read_sql call one line * Remove unused import * Update numpy, pangres; change mypy version * Remove type parameter, use Django ENGINE * Reverting change to validate_udw_vs_udp since it already created an engine Co-authored-by: Sam Sciolla <[email protected]> Co-authored-by: Code Hugger (Matthew Jones) <[email protected]>
To resolve vulnerability CVE-2015-20107, upgrade Python to v3.10. Also upgrade
pandas
related modules to work with the newer version of Python.Minor changes were also made to
start.sh
to fix a spelling mistake and clean up trailing spaces on a few lines.Closes #1406.
Test plan