The Open Science Labs (OSL) community is thrilled to announce our participation in the Google Summer of Code 2024, under NumFOCUS. We warmly welcome and encourage individuals from all around the globe to apply to GSoC with Open Science Labs.
Open Science Labs is a global community dedicated to creating an open space for teaching, learning, and sharing information about open science and computational tools. Our community develops tools that address real-world problems and collaborates with other projects and workgroups to improve technology and create international opportunities for our community.
Although our focus may seem broad, we initially prioritize supporting Research Software Engineers (RSEs) who often face computational challenges in their work. We recognize that many colleagues in scientific fields may not be familiar with programming languages, computational libraries, version control systems, databases, DevOps, and other computational tools. Therefore, we aim to provide a safe and open space for individuals to learn and share their knowledge. Our community is open to everyone, including students, professors, and industry professionals, as we believe that the challenges and technologies used in these fields are frequently similar, if not the same (in many cases).
Open Science Labs is primarily organized by its steering council, collaborators, and interns. You can access the current list of our active members on our website: https://opensciencelabs.org/about/team/.
To ensure a welcoming and safe environment for all members, OSL has established a Code of Conduct, which you can find here: https://opensciencelabs.org/about/coc/.
If you're interested in learning more about our organization and governance, please visit our website: https://opensciencelabs.org/about/governance/.
Scicookie is under active development and the applicants can pick up issues from the GitHub issue tracker (after discussing with the mentors) to work on them. This project can include:
- Fixing the truncated input in CLI.
- Implement depends_on properly, a way to have conditionally questions
- Updating pre-commit configurations.
- Adding new documentation engines.
BSD 3 Clause: https://github.com/osl-incubator/scicookie/blob/main/LICENSE
https://github.com/osl-incubator/scicookie/blob/main/CODE_OF_CONDUCT.md
- osl-incubator/scicookie#194
- osl-incubator/scicookie#195
- osl-incubator/scicookie#180
- osl-incubator/scicookie#178
- osl-incubator/scicookie#186
- osl-incubator/scicookie#143
- osl-incubator/scicookie#172
- Implement the assigned tasks
- Add and update documentation for the new implementation
- Add proper tests for the new implementation
- Create a blog post for each new implementation
- Prerequisites:
- Python
- Object-oriented programming (OOP)
- YAML
- Expected Time: 350 hours
- Potential Mentor(s): Saransh Chopra, Ivan Ogasawara, Anavelyz Perez, Yurely Camacho, Ever Vino
Project Idea 2: Add C++ backend for all the trees present in binary_trees.py, Porting stack.py to C++ backend
PyDataStructs project aims to be a Python package for various data structures and algorithms (including their parallel implementations). We are also working on providing C++ backend via Python C-API for high performance use cases. It is a partner project under Open Science Labs (see, https://opensciencelabs.org/affiliations/partnership/partners/).
Why PyDataStructs?
- Single package for all your data structures and algorithms
- Consistent and Clean Interface - The APIs we have provided are consistent with each other, clean, and easy to use. We make sure of that before adding any new data structure or algorithm.
- Well Tested - We thoroughly test our code before making any new addition to PyDataStructs. 99 percent lines of our code have already been tested by us.
- The first release i.e., PyDataStructs 0.0.1 is available at https://pydatastructs.readthedocs.io/en/0.0.1/index.html.
- Open issues on Github - https://github.com/codezonediitj/pydatastructs/issues.
- We have participated previously in programs like KWoC (organised by IIT Kharagpur), GSSoC.
Python tests running with Backend.CPP as the backend.
Students are requested to discuss before applying. Discussing via email (gdp.1807@gmail.com and anutosh.bhat.21@gmail.com) would be the best.
- Prerequisites:
- Experience with Python, C++
- Ability to read and implement algorithms
- Knowledge of Python C-API is not necessary but would be an additional benefit.
- Expected time: 350 hours
- Potential mentor(s): Gagandeep Singh, Anutosh Bhat, Ivan Ogasawara, Ever Vino
- https://pydatastructs.readthedocs.io/en/latest/
- https://github.com/codezonediitj/pydatastructs
Implement an algorithm to compare the provenance from two (or more) trials (i.e., executions of an experiment) to check their reproducibility. The provenance stored in the relational (sqlite) database by noWorkflow 2 contains intermediate variable values from a trial. These values could be compared to check how much or where executions deviate from each other.
It currently has some methods to explicitly tag variables of different trials and methods to compare them. It would be nice to have a way to compare the whole trial and estimate how much a trial deviate from another.
- Compare trials of the same script
- Estimate how much on trial deviate from another
- Consider different scripts and execution flows
- Indicate which parts of the scripts are not reproducible
- Each task has a different outcome
- Prerequisites:
- Python
- SQL or SQLAlchemy ORM
- Expected time: 350 hours
- Potential mentor(s): João Felipe Pimentel, Ivan Ogasawara, Ever Vino, Anavelyz Perez, Yurely Camacho