Skip to content

Files

Latest commit

e5abefb · Mar 1, 2024

History

History
147 lines (93 loc) · 6.43 KB

gsoc2024.md

File metadata and controls

147 lines (93 loc) · 6.43 KB

GSoC 2024 project ideas for Open Science Labs

The Open Science Labs (OSL) community is thrilled to announce our participation in the Google Summer of Code 2024, under NumFOCUS. We warmly welcome and encourage individuals from all around the globe to apply to GSoC with Open Science Labs.

About Open Science Labs

Open Science Labs is a global community dedicated to creating an open space for teaching, learning, and sharing information about open science and computational tools. Our community develops tools that address real-world problems and collaborates with other projects and workgroups to improve technology and create international opportunities for our community.

Although our focus may seem broad, we initially prioritize supporting Research Software Engineers (RSEs) who often face computational challenges in their work. We recognize that many colleagues in scientific fields may not be familiar with programming languages, computational libraries, version control systems, databases, DevOps, and other computational tools. Therefore, we aim to provide a safe and open space for individuals to learn and share their knowledge. Our community is open to everyone, including students, professors, and industry professionals, as we believe that the challenges and technologies used in these fields are frequently similar, if not the same (in many cases).

Organization

Open Science Labs is primarily organized by its steering council, collaborators, and interns. You can access the current list of our active members on our website: https://opensciencelabs.org/about/team/.

To ensure a welcoming and safe environment for all members, OSL has established a Code of Conduct, which you can find here: https://opensciencelabs.org/about/coc/.

If you're interested in learning more about our organization and governance, please visit our website: https://opensciencelabs.org/about/governance/.


Project Idea 1: Developing SciCookie further

Abstract

Scicookie is under active development and the applicants can pick up issues from the GitHub issue tracker (after discussing with the mentors) to work on them. This project can include:

  • Fixing the truncated input in CLI.
  • Implement depends_on properly, a way to have conditionally questions
  • Updating pre-commit configurations.
  • Adding new documentation engines.

License

BSD 3 Clause: https://github.com/osl-incubator/scicookie/blob/main/LICENSE

Code of Conduct

https://github.com/osl-incubator/scicookie/blob/main/CODE_OF_CONDUCT.md

Tasks

Expected Outcomes

  • Implement the assigned tasks
  • Add and update documentation for the new implementation
  • Add proper tests for the new implementation
  • Create a blog post for each new implementation

Details

  • Prerequisites:
    • Python
    • Object-oriented programming (OOP)
    • YAML
  • Expected Time: 350 hours
  • Potential Mentor(s): Saransh Chopra, Ivan Ogasawara, Anavelyz Perez, Yurely Camacho, Ever Vino

References


Project Idea 2: Add C++ backend for all the trees present in binary_trees.py, Porting stack.py to C++ backend

Abstract

PyDataStructs project aims to be a Python package for various data structures and algorithms (including their parallel implementations). We are also working on providing C++ backend via Python C-API for high performance use cases. It is a partner project under Open Science Labs (see, https://opensciencelabs.org/affiliations/partnership/partners/).

Why PyDataStructs?

  • Single package for all your data structures and algorithms
  • Consistent and Clean Interface - The APIs we have provided are consistent with each other, clean, and easy to use. We make sure of that before adding any new data structure or algorithm.
  • Well Tested - We thoroughly test our code before making any new addition to PyDataStructs. 99 percent lines of our code have already been tested by us.

Current state

Expected outcomes

Python tests running with Backend.CPP as the backend.

Details

Students are requested to discuss before applying. Discussing via email (gdp.1807@gmail.com and anutosh.bhat.21@gmail.com) would be the best.

  • Prerequisites:
    • Experience with Python, C++
    • Ability to read and implement algorithms
    • Knowledge of Python C-API is not necessary but would be an additional benefit.
  • Expected time: 350 hours
  • Potential mentor(s): Gagandeep Singh, Anutosh Bhat, Ivan Ogasawara, Ever Vino

References


Project Idea 3: Verify the reproducibility of an experiment

Abstract

Implement an algorithm to compare the provenance from two (or more) trials (i.e., executions of an experiment) to check their reproducibility. The provenance stored in the relational (sqlite) database by noWorkflow 2 contains intermediate variable values from a trial. These values could be compared to check how much or where executions deviate from each other.

Current state

It currently has some methods to explicitly tag variables of different trials and methods to compare them. It would be nice to have a way to compare the whole trial and estimate how much a trial deviate from another.

Tasks

  • Compare trials of the same script
  • Estimate how much on trial deviate from another
  • Consider different scripts and execution flows
  • Indicate which parts of the scripts are not reproducible

Expected outcomes

  • Each task has a different outcome

Details

  • Prerequisites:
    • Python
    • SQL or SQLAlchemy ORM
  • Expected time: 350 hours
  • Potential mentor(s): João Felipe Pimentel, Ivan Ogasawara, Ever Vino, Anavelyz Perez, Yurely Camacho