Skip to content

upgrade Python to v3.10 (iss. #1406) #1426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Sep 14, 2022

Conversation

lsloan
Copy link
Member

@lsloan lsloan commented Sep 7, 2022

To resolve vulnerability CVE-2015-20107, upgrade Python to v3.10. Also upgrade pandas related modules to work with the newer version of Python.

Minor changes were also made to start.sh to fix a spelling mistake and clean up trailing spaces on a few lines.

Closes #1406.

Test plan

  • Run MyLA locally and verify that it works properly.
  • Optional: Verify that Python 3.10 is installed.
    docker exec -it student_dashboard python --version  # E.g., Python 3.10.6

Cleaned up trailing spaces on a few lines.
Upgrade `pandas` and related modules to work with Python 3.10.
@lsloan lsloan self-assigned this Sep 7, 2022
@lsloan lsloan changed the title ⚠️ DRAFT - upgrade Python to v3.10 (iss. #1406) upgrade Python to v3.10 (iss. #1406) Sep 7, 2022
@lsloan lsloan marked this pull request as ready for review September 7, 2022 03:46
@lsloan lsloan requested review from ssciolla and jonespm September 7, 2022 03:46
requirements.txt Outdated
Comment on lines 28 to 30
numpy==1.22.0
pandas==1.3.1
pandas==1.4.4
pangres==2.3.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like bumping numpy is also appropriate? pangres has had two major releases, so we may need to change API usages, but I'd look at that as well, since we're updating pandas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this numpy version is still compatible. And pangres is pretty far behind, but seems like it should theoretically still be compatible. Did you test the cron?

Can you rebase this with master so we can more effectively test with the latest version of cron (UDP)? Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing with UDW, I'm seeing this warning I wasn't seeing before when using pd.read_sql:

UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy

Looks like we're passing a Django connection object. I guess it's probably better to create our own SQLAlchemy engine and then close it at the end of the cron?

See these:

Copy link
Contributor

@ssciolla ssciolla Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is happening on the views, too. We'll need to think about how best to open/close connections. Maybe we can form a MySQL connection string? Alternatively, we can use the Django ORM to pull data in the views, and pass it into DataFrame constructors, but that's a bigger change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test the cron.

Does the warning appear in the UI or in the server log?

If it doesn't appear in the UI, is the content affected by this warning? Is it getting the wrong information?

If the UI is unaffected, no warning or content problem, what should we do? Do we go ahead with this change on its own, or do we need to address the warning before including it in the release?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with a warning filter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a problem with the cron.py or were you seeing it in the application log from the views.py when using the django.db.connection? I guess I could run this and check.

I'm fine with either too, just seems like it's going to complicate this PR trying to implement something here for all the read_sql areas since we use 2 different databases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing it in the views.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be resolved by latest commits using the engine approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also bumped numpy and pangres, pangres only required a small change to respond to a parameter name change

@lsloan lsloan force-pushed the 1406-upgrade-python branch from 3dd6da0 to 807ba64 Compare September 13, 2022 19:46
lsloan and others added 4 commits September 13, 2022 15:53
Derive PostgreSQL connect string for data warehouse DB from Django connections, then create a single engine for all queries that will use that DB.
@ssciolla
Copy link
Contributor

@jonespm, I implemented both of the quicker solutions discussed. Please let me know which you'd prefer and I'll either update this PR or create a new one:

@jonespm
Copy link
Member

jonespm commented Sep 14, 2022

Was going to talk to you about this, thanks for picking this up @ssciolla

I'm good with #3, looks good. I'm fine with it being new after this is merged or just committed here and squashed.

Thanks! We'll probably want to improve the views in the future but we can probably wait to see if we have to rewrite them all anyway to use data marts.

PORT: int


def create_sqlalchemy_engine(db_params: DBParams, type: Literal['mysql', 'postgres']) -> Engine:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the type parameter isn't really needed as ENGINE will always be available in DBParams if you added it to the DBParams class. There would be a value of.

django.db.backends.postgresql or django.db.bacakends.mysql.. So seems like you could just drop this parameter and use

if 'mysql' in new_db_params['ENGINE']:

But I guess it isn't a big deal either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the requested change, see d22125c

Copy link
Member

@jonespm jonespm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the type parameter but I'm fine with this either way. Looks good!

@ssciolla ssciolla merged commit aa091fc into tl-its-umich-edu:master Sep 14, 2022
jonespm added a commit to jonespm/student-dashboard-django that referenced this pull request Sep 20, 2022
…u#1426)

* tl-its-umich-edu#1406 - spelling correction & clean-up

Cleaned up trailing spaces on a few lines.

* tl-its-umich-edu#1406 - upgrade to Python 3.10

Resolves vulnerability CVE-2015-20107.

* tl-its-umich-edu#1406 - upgrade `pandas` for Python 3.10

Upgrade `pandas` and related modules to work with Python 3.10.

* tl-its-umich-edu#1406 - fix all warehouse connections

Derive PostgreSQL connect string for data warehouse DB from Django connections, then create a single engine for all queries that will use that DB.

* Create utility function for creating mysql and postgres engines; apply to views'

* Remove other database conn prep

* Reuse create_sqlalchemy_engine in data_validation

* Remove unused variable

* Make a couple minor modifications to db_util

* Make one read_sql call one line

* Remove unused import

* Update numpy, pangres; change mypy version

* Remove type parameter, use Django ENGINE

* Reverting change to validate_udw_vs_udp since it already created an
engine

Co-authored-by: Sam Sciolla <[email protected]>
Co-authored-by: Code Hugger (Matthew Jones) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vulnerability: CVE-2015-20107, Python
3 participants