OSS: Difference between revisions
Lsokolowski1 (talk | contribs) mNo edit summary |
Lsokolowski1 (talk | contribs) mNo edit summary |
||
| (13 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
{{Draft}} | {{Draft}} | ||
[[Category:private]] | <!-- [[Category:private]] --> | ||
;title: OSS Training Course | ;title: OSS Training Course | ||
| Line 7: | Line 7: | ||
= OSS = | |||
OSS Training Materials | OSS Training Materials | ||
{{Can I use your material}} | {{Can I use your material}} | ||
= Introduction/Outline = | |||
* Code management, versioning, and licensing | * Code management, versioning, and licensing | ||
* Automation and code quality (best practices) | * Automation and code quality (best practices) | ||
| Line 25: | Line 25: | ||
** API documentation (Swagger, Sphinx, Docusaurus, etc.) | ** API documentation (Swagger, Sphinx, Docusaurus, etc.) | ||
= Main Keys/Concerns = | |||
== | From Andre's notes/suggestions (= | ||
= | = Open Source Software in Research: Strategy, Practice, and Impact = | ||
Open Source Software (OSS) as a strategic, technical, and scientific asset for research institutes, illustrated with some real-world examples. | |||
---- | |||
== 1 — The Strategic Role of OSS in a Research Institute == | |||
Open Source Software is foundational to modern research practice. | |||
Benefits: | |||
-- | * Visibility of research outputs | ||
* Increased scientific impact | |||
* Collaboration across institutions | |||
* Transparency and reproducibility | |||
* Long-term sustainability | |||
OSS functions as '''shared research infrastructure'''. | |||
---- | |||
== 2 — Real-World Example: CERN == | |||
CERN treats software as a first-class research output. | |||
Practices: | |||
* Public repositories for core software (e.g., ROOT, Geant4) | |||
* Strong open source licensing culture | |||
* Long-term maintenance beyond individual projects | |||
Impact: | |||
* Software reused globally in physics and beyond | |||
* Industrial and academic adoption | |||
* Software cited alongside publications | |||
Key lesson: | |||
Large-scale research infrastructures depend on open software. | |||
---- | |||
== 3 — OSS and Publicly Funded Research == | |||
Open software aligns with public funding principles. | |||
Key drivers: | |||
* Open science mandates | |||
* Reproducibility requirements | |||
* Accountability to taxpayers | |||
OSS ensures research results are: | |||
* Verifiable | |||
* Reusable | |||
* Preserved beyond project lifetimes | |||
---- | |||
== 4 — Real-World Example: European Commission & EOSC == | |||
The European Open Science Cloud (EOSC) promotes OSS. | |||
Practices: | |||
* Preference for open licenses | |||
* FAIR principles applied to software | |||
* Software recognized as a research output | |||
Impact: | |||
* Policy-level support for open software | |||
* Alignment across national research infrastructures | |||
Key lesson: | |||
OSS is increasingly embedded in research policy. | |||
---- | |||
== 5 — Licensing Choices: Why They Matter == | |||
Licenses define legal reuse. | |||
Without a license: | |||
* Code cannot be reused | |||
* Collaboration is legally blocked | |||
Licensing must be intentional and documented. | |||
---- | |||
== 6 — Real-World Example: NumPy & SciPy == | |||
NumPy and SciPy originated in academic research. | |||
License choice: | |||
* BSD (permissive) | |||
Outcomes: | |||
* Massive industrial and academic adoption | |||
* Integration into commercial products | |||
* Long-term sustainability via a broad community | |||
Key lesson: | |||
Permissive licenses can maximize scientific reach. | |||
---- | |||
== 7 — Permissive vs. Copyleft Licenses == | |||
Two main license families are common in research. | |||
Permissive: | |||
* MIT, BSD, Apache 2.0 | |||
* Fewer restrictions | |||
* High reuse potential | |||
Copyleft: | |||
* GPL, LGPL | |||
* Ensures openness of derivatives | |||
* May limit industrial integration | |||
---- | |||
== 8 — Real-World Example: GNU Scientific Software == | |||
GNU scientific tools use copyleft licenses. | |||
License choice: | |||
* GPL | |||
Outcomes: | |||
* Guaranteed openness of derivatives | |||
* Strong alignment with free software principles | |||
* Smaller but ideologically aligned ecosystem | |||
Key lesson: | |||
Copyleft prioritizes openness over adoption scale. | |||
---- | |||
== 9 — Minimum Best Practices for Publishing Research Software == | |||
Research software should meet baseline standards. | |||
Required: | |||
* Public repository | |||
* Clear license | |||
* Documentation | |||
* Versioning | |||
* Citation metadata | |||
Quality enables reuse. | |||
---- | |||
== 10 — Real-World Example: EMBL-EBI == | |||
EMBL-EBI publishes bioinformatics software openly. | |||
Practices: | |||
* Standardized repositories | |||
* Clear documentation | |||
* Explicit versioning and releases | |||
Impact: | |||
* Tools reused globally in life sciences | |||
* Software cited in publications | |||
* Long-lived community tools | |||
Key lesson: | |||
Consistency scales reuse. | |||
---- | |||
== 11 — Documentation as a Research Output == | |||
Documentation supports reproducibility. | |||
Minimum documentation: | |||
* Purpose and scope | |||
* Installation | |||
* Usage examples | |||
* Limitations | |||
Good documentation is an investment, not overhead. | |||
---- | |||
== 12 — Versioning, Releases, and Citation == | |||
Stable versions enable scientific referencing. | |||
Best practices: | |||
* Semantic Versioning | |||
* Git tags | |||
* DOI assignment via Zenodo | |||
* CITATION.cff file | |||
---- | |||
== 13 — Real-World Example: Zenodo + GitHub == | |||
Many institutes integrate GitHub with Zenodo. | |||
Practices: | |||
* DOI minted for each release | |||
* Software cited like a paper | |||
* Version-specific references | |||
Used by: | |||
* CERN | |||
* Universities | |||
* EU-funded projects | |||
Key lesson: | |||
Infrastructure exists — use it. | |||
---- | |||
== 14 — Governance When Opening Internal Code == | |||
Open code requires explicit governance. | |||
Key questions: | |||
* Who reviews changes? | |||
* Who releases software? | |||
* Who resolves disputes? | |||
Governance should be lightweight but explicit. | |||
---- | |||
== 15 — Real-World Example: Apache Software Foundation == | |||
ASF provides a mature governance model. | |||
Practices: | |||
* Merit-based contributor model | |||
* Clear maintainer roles | |||
* Transparent decision-making | |||
Impact: | |||
* Sustainable projects | |||
* Low institutional dependency | |||
* Long-term continuity | |||
Key lesson: | |||
Governance enables longevity. | |||
---- | |||
== 16 — Managing External Contributions == | |||
External contributions need structure. | |||
Best practices: | |||
* Pull Requests only | |||
* Mandatory reviews | |||
* CI enforcement | |||
* Code of Conduct | |||
These practices protect both contributors and institutions. | |||
---- | |||
== 17 — Positioning OSS as Scientific Impact == | |||
Software impact is measurable. | |||
Indicators: | |||
* Citations (DOIs) | |||
* External contributors | |||
* Downstream reuse | |||
* Inclusion in workflows or infrastructures | |||
---- | |||
== 18 — Real-World Example: Research Software as Impact == | |||
Examples: | |||
* R language ecosystem (originated in academia) | |||
* scikit-learn (academic origins, global adoption) | |||
* Astropy (community-governed astronomy software) | |||
Recognized impact: | |||
* Thousands of citations | |||
* Used in publications across disciplines | |||
Key lesson: | |||
OSS can outlive individual projects. | |||
---- | |||
== 19 — Technical Repository Management == | |||
Engineering practices support trust. | |||
Minimum requirements: | |||
* Stable main branch | |||
* PR-based workflow | |||
* Automated tests | |||
* CI pipelines | |||
* Release tagging | |||
* Dependency management | |||
---- | |||
== 20 — Real-World Example: NASA Open Source == | |||
NASA publishes and maintains OSS. | |||
Practices: | |||
* Mandatory open repositories | |||
* Automated CI | |||
* Clear contribution rules | |||
Impact: | |||
* External reuse | |||
* Industry collaboration | |||
* Increased transparency | |||
Key lesson: | |||
Technical discipline enables openness at scale. | |||
---- | |||
== 21 — Key Takeaways == | |||
* OSS is strategic research infrastructure | |||
* Licensing shapes reuse and impact | |||
* Minimum quality standards are essential | |||
* Governance enables safe collaboration | |||
* Software impact is measurable and reportable | |||
* Automation sustains quality over time | |||
= Extended Details = | |||
Unfold it with the '''Expand''' button on the very right side below | |||
<div class="mw-collapsible mw-collapsed"> | |||
= Modern Software Development Practices (Python & JavaScript) = | = Modern Software Development Practices (Python & JavaScript) = | ||
Best practices for managing, testing, and documenting software projects built with '''Python''' and '''JavaScript'''. | |||
== Code Management, Versioning, and Licensing == | == Code Management, Versioning, and Licensing == | ||
| Line 503: | Line 791: | ||
* CI pipelines enabled | * CI pipelines enabled | ||
* Issue and PR templates | * Issue and PR templates | ||
= Open Source Collaboration and Release Management = | |||
This section defines contribution workflows, security policies, community standards, and automated releases. | |||
== Pull Request Templates == | |||
Pull Request templates help reviewers and contributors align on expectations. | |||
=== Pull Request Template (General) === | |||
<pre> | |||
## Description | |||
Brief summary of the changes introduced by this PR. | |||
## Related Issue | |||
Closes #<issue-number> | |||
## Type of Change | |||
- [ ] Bug fix | |||
- [ ] New feature | |||
- [ ] Documentation update | |||
- [ ] Refactoring | |||
- [ ] CI / tooling | |||
## How Has This Been Tested? | |||
Describe the tests that you ran. | |||
## Checklist | |||
- [ ] Code follows project style guidelines | |||
- [ ] Tests added or updated | |||
- [ ] Documentation updated (if applicable) | |||
- [ ] CI pipeline passes | |||
</pre> | |||
Best practices: | |||
* Require PR templates for all contributions | |||
* Enforce reviews via branch protection | |||
* Keep PRs small and focused | |||
== Security Policy (SECURITY.md) == | |||
Open source projects should clearly define how to report vulnerabilities. | |||
=== Example SECURITY.md === | |||
<pre> | |||
# Security Policy | |||
## Supported Versions | |||
Only the latest major version is actively supported with security updates. | |||
## Reporting a Vulnerability | |||
If you discover a security vulnerability, please do NOT open a public issue. | |||
Instead, report it by emailing: | |||
security@project-domain.example | |||
Please include: | |||
- A description of the vulnerability | |||
- Steps to reproduce | |||
- Potential impact | |||
- Suggested remediation (if available) | |||
We aim to respond within 72 hours. | |||
</pre> | |||
Best practices: | |||
* Never discuss vulnerabilities publicly before a fix | |||
* Acknowledge reporters responsibly | |||
* Publish security advisories after resolution | |||
== Code of Conduct (CODE_OF_CONDUCT.md) == | |||
A Code of Conduct creates a safe and welcoming community. | |||
=== Example CODE_OF_CONDUCT.md === | |||
<pre> | |||
# Code of Conduct | |||
## Our Pledge | |||
We are committed to providing a respectful and inclusive environment for everyone. | |||
## Expected Behavior | |||
- Be respectful and considerate | |||
- Use welcoming and inclusive language | |||
- Accept constructive criticism | |||
- Focus on what is best for the community | |||
## Unacceptable Behavior | |||
- Harassment or discrimination | |||
- Trolling or personal attacks | |||
- Publishing private information | |||
## Enforcement | |||
Project maintainers are responsible for enforcing this code of conduct. | |||
## Reporting | |||
Report incidents to: | |||
conduct@project-domain.example | |||
</pre> | |||
Recommendation: | |||
* Use the Contributor Covenant as a base | |||
* Enforce consistently and transparently | |||
== Release Automation (semantic-release) == | |||
Automated releases reduce human error and ensure consistency. | |||
=== What semantic-release Does === | |||
* Determines next version from commit messages | |||
* Generates changelog entries | |||
* Creates Git tags and releases | |||
* Publishes artifacts automatically | |||
=== Commit Requirements === | |||
semantic-release requires '''Conventional Commits''': | |||
* feat: introduces a new feature (MINOR) | |||
* fix: bug fix (PATCH) | |||
* feat!: or BREAKING CHANGE (MAJOR) | |||
=== Example semantic-release Configuration === | |||
<syntaxhighlight lang="json"> | |||
{ | |||
"branches": ["main"], | |||
"plugins": [ | |||
"@semantic-release/commit-analyzer", | |||
"@semantic-release/release-notes-generator", | |||
"@semantic-release/changelog", | |||
"@semantic-release/github" | |||
] | |||
} | |||
</syntaxhighlight> | |||
=== GitHub Actions: Automated Release === | |||
<syntaxhighlight lang="yaml"> | |||
name: Release | |||
on: | |||
push: | |||
branches: | |||
- main | |||
jobs: | |||
release: | |||
runs-on: ubuntu-latest | |||
steps: | |||
- uses: actions/checkout@v4 | |||
- uses: actions/setup-node@v4 | |||
with: | |||
node-version: "20" | |||
- run: npm ci | |||
- run: npx semantic-release | |||
env: | |||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | |||
</syntaxhighlight> | |||
=== Python + semantic-release Notes === | |||
* semantic-release manages versions and tags | |||
* Python packages should: | |||
** Read version from git tags | |||
** Or inject version during build (setuptools_scm) | |||
== Open Source Release Best Practices == | |||
* Use automated releases | |||
* Never manually edit versions | |||
* Always release from main branch | |||
* Keep CHANGELOG.md generated automatically | |||
* Tag every release | |||
== Final Open Source Readiness Checklist == | |||
* README.md | |||
* CONTRIBUTING.md | |||
* CHANGELOG.md | |||
* LICENSE | |||
* CODE_OF_CONDUCT.md | |||
* SECURITY.md | |||
* Issue templates | |||
* Pull Request templates | |||
* CI pipelines per service | |||
* Automated releases enabled | |||
= Advanced Open Source Project Setup = | |||
This section completes the open source framework with publishing automation, governance, labeling standards, and repository structure. | |||
== Automated Package Publishing == | |||
Automated publishing ensures consistent, repeatable releases. | |||
=== PyPI Publishing (Python) === | |||
Best practice: | |||
* Publish only from tagged releases | |||
* Use CI for trusted publishing | |||
==== GitHub Actions: Publish to PyPI ==== | |||
<syntaxhighlight lang="yaml"> | |||
name: Publish Python Package | |||
on: | |||
release: | |||
types: [published] | |||
jobs: | |||
publish: | |||
runs-on: ubuntu-latest | |||
steps: | |||
- uses: actions/checkout@v4 | |||
- uses: actions/setup-python@v5 | |||
with: | |||
python-version: "3.11" | |||
- run: pip install build | |||
- run: python -m build | |||
- uses: pypa/gh-action-pypi-publish@release/v1 | |||
</syntaxhighlight> | |||
Requirements: | |||
* pyproject.toml configured | |||
* Trusted Publishing enabled in PyPI | |||
=== npm Publishing (JavaScript) === | |||
Best practice: | |||
* Use semantic-release | |||
* Publish only from main branch | |||
==== GitHub Actions: Publish to npm ==== | |||
<syntaxhighlight lang="yaml"> | |||
name: Publish npm Package | |||
on: | |||
push: | |||
branches: | |||
- main | |||
jobs: | |||
publish: | |||
runs-on: ubuntu-latest | |||
steps: | |||
- uses: actions/checkout@v4 | |||
- uses: actions/setup-node@v4 | |||
with: | |||
node-version: "20" | |||
registry-url: https://registry.npmjs.org | |||
- run: npm ci | |||
- run: npx semantic-release | |||
env: | |||
NPM_TOKEN: ${{ secrets.NPM_TOKEN }} | |||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | |||
</syntaxhighlight> | |||
== GitHub Labels Taxonomy == | |||
A consistent label system improves triage and contributor onboarding. | |||
=== Type Labels === | |||
* bug | |||
* enhancement | |||
* documentation | |||
* refactor | |||
* security | |||
* question | |||
=== Priority Labels === | |||
* priority: critical | |||
* priority: high | |||
* priority: medium | |||
* priority: low | |||
=== Status Labels === | |||
* status: triage | |||
* status: blocked | |||
* status: in progress | |||
* status: ready for review | |||
=== Scope / Stack Labels === | |||
* python | |||
* javascript | |||
* frontend | |||
* backend | |||
* api | |||
* ci | |||
=== Community Labels === | |||
* good first issue | |||
* help wanted | |||
* breaking change | |||
== Maintainers and Governance Model == | |||
Clear governance improves trust and sustainability. | |||
=== Roles === | |||
* '''Maintainers''' | |||
** Own project direction | |||
** Review and merge PRs | |||
** Manage releases | |||
* '''Contributors''' | |||
** Submit issues and PRs | |||
** Improve code and documentation | |||
=== Decision Making === | |||
* Decisions are made publicly in issues or PRs | |||
* Maintainers aim for consensus | |||
* Maintainer vote is final when consensus cannot be reached | |||
=== Becoming a Maintainer === | |||
* Consistent high-quality contributions | |||
* Community engagement | |||
* Invitation by existing maintainers | |||
=== Governance File === | |||
Recommended file: | |||
* GOVERNANCE.md | |||
== Complete Open Source Starter Repository Structure == | |||
Recommended structure for a Python + JavaScript open source project: | |||
<syntaxhighlight lang="text"> | |||
project-root/ | |||
├── backend/ | |||
│ ├── app/ | |||
│ │ ├── __init__.py | |||
│ │ ├── main.py | |||
│ │ ├── api/ | |||
│ │ └── services/ | |||
│ ├── tests/ | |||
│ ├── pyproject.toml | |||
│ └── README.md | |||
│ | |||
├── frontend/ | |||
│ ├── src/ | |||
│ │ ├── components/ | |||
│ │ ├── pages/ | |||
│ │ └── services/ | |||
│ ├── tests/ | |||
│ ├── package.json | |||
│ └── README.md | |||
│ | |||
├── docs/ | |||
│ ├── api/ | |||
│ ├── guides/ | |||
│ └── README.md | |||
│ | |||
├── .github/ | |||
│ ├── workflows/ | |||
│ │ ├── backend-ci.yml | |||
│ │ ├── frontend-ci.yml | |||
│ │ └── release.yml | |||
│ ├── ISSUE_TEMPLATE/ | |||
│ └── PULL_REQUEST_TEMPLATE.md | |||
│ | |||
├── CHANGELOG.md | |||
├── CODE_OF_CONDUCT.md | |||
├── CONTRIBUTING.md | |||
├── GOVERNANCE.md | |||
├── LICENSE | |||
├── README.md | |||
├── SECURITY.md | |||
└── semantic-release.json | |||
</syntaxhighlight> | |||
== Open Source Maturity Checklist == | |||
* Automated CI per service | |||
* Automated releases | |||
* PyPI and npm publishing | |||
* Clear contribution workflow | |||
* Governance defined | |||
* Labels and templates configured | |||
* Security policy documented | |||
* Code of conduct enforced | |||
== Final Notes == | |||
Well-maintained open source projects prioritize: | |||
* Automation over manual work | |||
* Transparency over private decisions | |||
* Documentation over tribal knowledge | |||
* Community over individual ownership | |||
</div> | |||
= Concerns = | |||
Failure modes that hurt trust, adoption, and scientific credibility of open-source research software | |||
# '''Unmaintained dependencies''' leading to security vulnerabilities | |||
# '''Hardcoded credentials''' or exposed configuration secrets | |||
# '''Outdated documentation''' or broken installation procedures | |||
# '''Inactive issue trackers''' or unresolved pull requests | |||
# Lack of clarity regarding '''maintenance status''' | |||
= Avoiding Those Common Failures in OSS = | |||
Frequent risks in open source projects and how to prevent them through policy, process, and tooling. | |||
---- | |||
== 1 — Unmaintained Dependencies and Security Vulnerabilities == | |||
Risk: | |||
* Dependencies become unmaintained or insecure | |||
* Transitive dependencies introduce vulnerabilities | |||
* Security risks propagate silently | |||
How to avoid: | |||
* Pin dependency versions (requirements.txt, package-lock.json) | |||
* Use automated dependency scanning tools | |||
** Dependabot | |||
** Renovate | |||
* Monitor security advisories (CVE databases) | |||
* Remove unused dependencies regularly | |||
* Prefer well-maintained, widely used libraries | |||
Institutional practice: | |||
* Assign dependency ownership | |||
* Schedule periodic dependency reviews | |||
---- | |||
== 2 — Hardcoded Credentials and Exposed Secrets == | |||
Risk: | |||
* API keys or passwords committed to repositories | |||
* Accidental leaks via configuration files | |||
* Permanent exposure due to Git history | |||
How to avoid: | |||
* Never store secrets in source code | |||
* Use environment variables for configuration | |||
* Add secret patterns to .gitignore | |||
* Use automated secret scanning tools | |||
** GitHub Secret Scanning | |||
** TruffleHog | |||
* Rotate secrets immediately if exposed | |||
Institutional practice: | |||
* Define a secrets management policy | |||
* Educate researchers on secure configuration | |||
---- | |||
== 3 — Outdated Documentation and Broken Installation == | |||
Risk: | |||
* Users cannot install or run the software | |||
* Research results are not reproducible | |||
* Loss of user trust | |||
How to avoid: | |||
* Treat documentation as part of the release | |||
* Test installation steps in CI | |||
* Keep a minimal “Quick Start” section | |||
* Archive deprecated instructions clearly | |||
* Assign documentation ownership | |||
Institutional practice: | |||
* Require documentation updates for every release | |||
* Include docs review in PR process | |||
---- | |||
== 4 — Inactive Issues and Unresolved Pull Requests == | |||
Risk: | |||
* Contributors feel ignored | |||
* Community engagement declines | |||
* Project appears abandoned | |||
How to avoid: | |||
* Define response-time expectations | |||
* Use labels: triage, help wanted, blocked | |||
* Close stale issues transparently | |||
* Acknowledge all contributions, even if rejected | |||
* Use automation for stale issue management | |||
Institutional practice: | |||
* Allocate time for issue triage | |||
* Track maintainer workload explicitly | |||
---- | |||
== 5 — Lack of Clarity About Maintenance Status == | |||
Risk: | |||
* Users do not know if the software is reliable | |||
* Unclear expectations lead to frustration | |||
* Hidden abandonment damages institutional credibility | |||
How to avoid: | |||
* Explicitly state maintenance status in README | |||
* Use standard lifecycle labels: | |||
** Active | |||
** Maintenance | |||
** Deprecated | |||
** Archived | |||
* Document support scope and response expectations | |||
* Provide end-of-life notices when applicable | |||
Institutional practice: | |||
* Require lifecycle statements for all public repositories | |||
* Archive inactive repositories explicitly | |||
---- | |||
== 6 — Maintenance Transparency Best Practices == | |||
Recommended signals: | |||
* Last release date | |||
* CI status badge | |||
* Maintainer contact or team | |||
* Roadmap or milestones | |||
* CONTRIBUTING.md and GOVERNANCE.md | |||
Transparency builds trust, even with limited resources. | |||
---- | |||
== 7 — Key Takeaways == | |||
* Dependency hygiene is a security requirement | |||
* Secrets management is non-negotiable | |||
* Documentation enables reproducibility | |||
* Community engagement requires active processes | |||
* Maintenance status must be explicit | |||
Latest revision as of 09:11, 23 February 2026
THIS IS A DRAFT
This text may not be complete.
- title
- OSS Training Course
- author
- Lukasz Sokolowski
OSS
OSS Training Materials
Copyright Notice
Copyright © 2004-2026 by NobleProg Limited All rights reserved.
This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise.
Introduction/Outline
- Code management, versioning, and licensing
- Automation and code quality (best practices)
- Continuous Integration (CI) on GitHub/GitLab
- Automated testing (unit, integration, end-to-end)
- Changelog (Keep a Changelog, Conventional Commits)
- Issue management and roadmap
- Best practices in issue creation (templates, labels, milestones)
- Documentation
- Effective README: objectives, installation, usage, contributions
- Contributing Guide (CONTRIBUTING.md)
- API documentation (Swagger, Sphinx, Docusaurus, etc.)
Main Keys/Concerns
From Andre's notes/suggestions (=
Open Source Software in Research: Strategy, Practice, and Impact
Open Source Software (OSS) as a strategic, technical, and scientific asset for research institutes, illustrated with some real-world examples.
1 — The Strategic Role of OSS in a Research Institute
Open Source Software is foundational to modern research practice.
Benefits:
- Visibility of research outputs
- Increased scientific impact
- Collaboration across institutions
- Transparency and reproducibility
- Long-term sustainability
OSS functions as shared research infrastructure.
2 — Real-World Example: CERN
CERN treats software as a first-class research output.
Practices:
- Public repositories for core software (e.g., ROOT, Geant4)
- Strong open source licensing culture
- Long-term maintenance beyond individual projects
Impact:
- Software reused globally in physics and beyond
- Industrial and academic adoption
- Software cited alongside publications
Key lesson: Large-scale research infrastructures depend on open software.
3 — OSS and Publicly Funded Research
Open software aligns with public funding principles.
Key drivers:
- Open science mandates
- Reproducibility requirements
- Accountability to taxpayers
OSS ensures research results are:
- Verifiable
- Reusable
- Preserved beyond project lifetimes
4 — Real-World Example: European Commission & EOSC
The European Open Science Cloud (EOSC) promotes OSS.
Practices:
- Preference for open licenses
- FAIR principles applied to software
- Software recognized as a research output
Impact:
- Policy-level support for open software
- Alignment across national research infrastructures
Key lesson: OSS is increasingly embedded in research policy.
5 — Licensing Choices: Why They Matter
Licenses define legal reuse.
Without a license:
- Code cannot be reused
- Collaboration is legally blocked
Licensing must be intentional and documented.
6 — Real-World Example: NumPy & SciPy
NumPy and SciPy originated in academic research.
License choice:
- BSD (permissive)
Outcomes:
- Massive industrial and academic adoption
- Integration into commercial products
- Long-term sustainability via a broad community
Key lesson: Permissive licenses can maximize scientific reach.
7 — Permissive vs. Copyleft Licenses
Two main license families are common in research.
Permissive:
- MIT, BSD, Apache 2.0
- Fewer restrictions
- High reuse potential
Copyleft:
- GPL, LGPL
- Ensures openness of derivatives
- May limit industrial integration
8 — Real-World Example: GNU Scientific Software
GNU scientific tools use copyleft licenses.
License choice:
- GPL
Outcomes:
- Guaranteed openness of derivatives
- Strong alignment with free software principles
- Smaller but ideologically aligned ecosystem
Key lesson: Copyleft prioritizes openness over adoption scale.
9 — Minimum Best Practices for Publishing Research Software
Research software should meet baseline standards.
Required:
- Public repository
- Clear license
- Documentation
- Versioning
- Citation metadata
Quality enables reuse.
10 — Real-World Example: EMBL-EBI
EMBL-EBI publishes bioinformatics software openly.
Practices:
- Standardized repositories
- Clear documentation
- Explicit versioning and releases
Impact:
- Tools reused globally in life sciences
- Software cited in publications
- Long-lived community tools
Key lesson: Consistency scales reuse.
11 — Documentation as a Research Output
Documentation supports reproducibility.
Minimum documentation:
- Purpose and scope
- Installation
- Usage examples
- Limitations
Good documentation is an investment, not overhead.
12 — Versioning, Releases, and Citation
Stable versions enable scientific referencing.
Best practices:
- Semantic Versioning
- Git tags
- DOI assignment via Zenodo
- CITATION.cff file
13 — Real-World Example: Zenodo + GitHub
Many institutes integrate GitHub with Zenodo.
Practices:
- DOI minted for each release
- Software cited like a paper
- Version-specific references
Used by:
- CERN
- Universities
- EU-funded projects
Key lesson: Infrastructure exists — use it.
14 — Governance When Opening Internal Code
Open code requires explicit governance.
Key questions:
- Who reviews changes?
- Who releases software?
- Who resolves disputes?
Governance should be lightweight but explicit.
15 — Real-World Example: Apache Software Foundation
ASF provides a mature governance model.
Practices:
- Merit-based contributor model
- Clear maintainer roles
- Transparent decision-making
Impact:
- Sustainable projects
- Low institutional dependency
- Long-term continuity
Key lesson: Governance enables longevity.
16 — Managing External Contributions
External contributions need structure.
Best practices:
- Pull Requests only
- Mandatory reviews
- CI enforcement
- Code of Conduct
These practices protect both contributors and institutions.
17 — Positioning OSS as Scientific Impact
Software impact is measurable.
Indicators:
- Citations (DOIs)
- External contributors
- Downstream reuse
- Inclusion in workflows or infrastructures
18 — Real-World Example: Research Software as Impact
Examples:
- R language ecosystem (originated in academia)
- scikit-learn (academic origins, global adoption)
- Astropy (community-governed astronomy software)
Recognized impact:
- Thousands of citations
- Used in publications across disciplines
Key lesson: OSS can outlive individual projects.
19 — Technical Repository Management
Engineering practices support trust.
Minimum requirements:
- Stable main branch
- PR-based workflow
- Automated tests
- CI pipelines
- Release tagging
- Dependency management
20 — Real-World Example: NASA Open Source
NASA publishes and maintains OSS.
Practices:
- Mandatory open repositories
- Automated CI
- Clear contribution rules
Impact:
- External reuse
- Industry collaboration
- Increased transparency
Key lesson: Technical discipline enables openness at scale.
21 — Key Takeaways
- OSS is strategic research infrastructure
- Licensing shapes reuse and impact
- Minimum quality standards are essential
- Governance enables safe collaboration
- Software impact is measurable and reportable
- Automation sustains quality over time
Extended Details
Unfold it with the Expand button on the very right side below
Modern Software Development Practices (Python & JavaScript)
Best practices for managing, testing, and documenting software projects built with Python and JavaScript.
Code Management, Versioning, and Licensing
- Use Git for source control
- Branching strategy:
- main – stable production code
- feature branches – new development
- Use Semantic Versioning (MAJOR.MINOR.PATCH)
- Add a LICENSE file (MIT or Apache 2.0 commonly used)
- Protect main branches with:
- Pull / Merge Request reviews
- Mandatory CI checks
Automation and Code Quality (Python & JS)
Python
- Linters: flake8, pylint
- Formatter: black
- Import sorting: isort
- Type checking: mypy
JavaScript
- Linter: ESLint
- Formatter: Prettier
- Type checking: TypeScript (recommended)
Best practices:
- Run linters and formatters automatically
- Keep functions small and readable
- Follow PEP 8 (Python) and standard JS style guides
Continuous Integration (CI)
CI pipelines automatically validate code on each push or pull request.
Example: GitHub Actions
name: CI
on:
pull_request:
push:
branches: [ main ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install Python dependencies
run: |
pip install -r requirements.txt
- name: Lint Python
run: |
flake8 .
black --check .
- name: Run Python tests
run: pytest
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
- name: Install JS dependencies
run: npm ci
- name: Lint JS
run: npm run lint
- name: Run JS tests
run: npm test
Example: GitLab CI
stages:
- lint
- test
python_lint:
stage: lint
image: python:3.11
script:
- pip install flake8 black
- flake8 .
- black --check .
python_test:
stage: test
image: python:3.11
script:
- pip install -r requirements.txt
- pytest
js_lint:
stage: lint
image: node:20
script:
- npm ci
- npm run lint
js_test:
stage: test
image: node:20
script:
- npm ci
- npm test
Benefits:
- Early detection of issues
- Enforced quality standards
- Reliable and repeatable builds
Automated Testing
Python
- Frameworks: pytest, unittest
- Tools:
- pytest-cov (coverage)
- requests-mock / responses (API mocking)
JavaScript
- Unit & integration: Jest, Vitest
- End-to-end (E2E): Cypress, Playwright
Best practices:
- Run tests automatically in CI
- Test behavior, not implementation details
- Keep test execution fast
Changelog and Commit Standards
- Maintain CHANGELOG.md
- Follow Keep a Changelog structure:
- Added
- Changed
- Fixed
- Deprecated
Conventional Commits
- feat: new feature
- fix: bug fix
- docs: documentation
- test: tests
- chore: maintenance
Issue Management and Roadmap
- Use issues to track bugs, features, and technical debt
- Organize work using milestones and boards
- Reference issues in commits and merge requests
Best Practices in Issue Creation
- Use issue templates (bug / feature)
- Apply labels:
- python
- javascript
- bug
- enhancement
- documentation
- Always include clear reproduction steps for bugs
Documentation
Effective README
A strong README.md includes:
- Project overview
- Python / Node.js requirements
- Installation steps
- Usage examples
- Testing instructions
- License
Contributing Guide (CONTRIBUTING.md)
Should define:
- Environment setup
- Coding standards
- Commit conventions
- Pull Request workflow
API Documentation
Python
- Sphinx – documentation from docstrings
- FastAPI – automatic OpenAPI / Swagger
- MkDocs – lightweight docs
JavaScript
- Swagger / OpenAPI – REST APIs
- JSDoc – inline documentation
- Docusaurus – documentation portals
Recommended Project Structure
Example structure for a combined Python + JavaScript repository:
project-root/
├── backend/
│ ├── app/
│ │ ├── __init__.py
│ │ ├── main.py
│ │ ├── api/
│ │ └── services/
│ ├── tests/
│ ├── requirements.txt
│ └── pyproject.toml
│
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ ├── pages/
│ │ └── services/
│ ├── tests/
│ ├── package.json
│ └── package-lock.json
│
├── docs/
│ ├── api/
│ └── guides/
│
├── .github/ or .gitlab/
│ └── ci/
│
├── CHANGELOG.md
├── CONTRIBUTING.md
├── README.md
└── LICENSE
Key Takeaways
- CI enforces quality for Python and JavaScript
- Automated testing reduces regressions
- Clear structure improves maintainability
- Documentation is part of the codebase
Open Source Best Practices (Python & JavaScript)
This section extends the project guidelines with patterns commonly used in successful open source projects.
Separate CI Pipelines per Service
In multi-service or monorepo projects, each service should have an independent CI pipeline.
Benefits:
- Faster CI execution
- Clear ownership per service
- Reduced coupling between frontend and backend
GitHub Actions (Per Service)
Each service has its own workflow file.
.github/workflows/
├── backend-ci.yml
└── frontend-ci.yml
Example: Backend CI
name: Backend CI
on:
push:
paths:
- "backend/**"
pull_request:
paths:
- "backend/**"
jobs:
backend:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install -r backend/requirements.txt
- run: flake8 backend
- run: pytest backend/tests
Example: Frontend CI
name: Frontend CI
on:
push:
paths:
- "frontend/**"
pull_request:
paths:
- "frontend/**"
jobs:
frontend:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "20"
- run: cd frontend && npm ci
- run: cd frontend && npm run lint
- run: cd frontend && npm test
GitLab CI (Per Service)
backend:
stage: test
rules:
- changes:
- backend/**/*
image: python:3.11
script:
- pip install -r backend/requirements.txt
- pytest backend/tests
Monorepo vs Multirepo
Choosing the right repository strategy is critical for scalability.
| Aspect | Monorepo | Multirepo |
|---|---|---|
| Code location | Single repository | One repository per service |
| CI complexity | Higher | Lower |
| Dependency sharing | Easy | Requires versioning |
| Access control | Unified | Granular |
| Tooling | Requires advanced CI | Simpler |
| Open source friendliness | Good for small teams | Best for large ecosystems |
Recommendations:
- Monorepo – small teams, tight coupling, shared releases
- Multirepo – independent services, different release cycles, large communities
Issue Templates (Wiki Format)
Clear issue templates improve collaboration and contributor experience.
Bug Report
== Description == A clear and concise description of the bug. == Steps to Reproduce == # Step 1 # Step 2 # Step 3 == Expected Behavior == What you expected to happen. == Actual Behavior == What actually happened. == Environment == * OS: * Python / Node.js version: * Browser (if applicable): == Additional Context == Logs, screenshots, or links.
Feature Request
== Summary == Short description of the requested feature. == Motivation == Why is this feature needed? == Proposed Solution == Describe the preferred solution. == Alternatives == Other approaches considered. == Additional Context == Links, mockups, or references.
Documentation Issue
== Documentation Section == Which page or file needs improvement? == Problem == What is unclear, missing, or incorrect? == Suggested Improvement == Proposed text or structure.
Open Source Project Best Practices
These practices help attract and retain contributors.
Governance and Transparency
- Define maintainers and roles
- Use public roadmaps
- Make decisions in issues and PRs
Contribution Experience
- Clear README and CONTRIBUTING.md
- Friendly issue templates
- Label beginner issues (e.g. good first issue)
Licensing and Legal
- Always include a LICENSE file
- Ensure dependencies are license-compatible
- Avoid committing secrets or credentials
Community Standards
- Add a Code of Conduct (e.g. Contributor Covenant)
- Enforce respectful communication
- Moderate discussions consistently
Release Management
- Use semantic versioning
- Maintain a changelog
- Tag releases
- Automate releases where possible
Security
- Provide a SECURITY.md
- Define responsible disclosure process
- Keep dependencies up to date
Open Source Checklist
- README.md
- CONTRIBUTING.md
- CHANGELOG.md
- LICENSE
- CODE_OF_CONDUCT.md
- SECURITY.md
- CI pipelines enabled
- Issue and PR templates
Open Source Collaboration and Release Management
This section defines contribution workflows, security policies, community standards, and automated releases.
Pull Request Templates
Pull Request templates help reviewers and contributors align on expectations.
Pull Request Template (General)
## Description Brief summary of the changes introduced by this PR. ## Related Issue Closes #<issue-number> ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Documentation update - [ ] Refactoring - [ ] CI / tooling ## How Has This Been Tested? Describe the tests that you ran. ## Checklist - [ ] Code follows project style guidelines - [ ] Tests added or updated - [ ] Documentation updated (if applicable) - [ ] CI pipeline passes
Best practices:
- Require PR templates for all contributions
- Enforce reviews via branch protection
- Keep PRs small and focused
Security Policy (SECURITY.md)
Open source projects should clearly define how to report vulnerabilities.
Example SECURITY.md
# Security Policy ## Supported Versions Only the latest major version is actively supported with security updates. ## Reporting a Vulnerability If you discover a security vulnerability, please do NOT open a public issue. Instead, report it by emailing: security@project-domain.example Please include: - A description of the vulnerability - Steps to reproduce - Potential impact - Suggested remediation (if available) We aim to respond within 72 hours.
Best practices:
- Never discuss vulnerabilities publicly before a fix
- Acknowledge reporters responsibly
- Publish security advisories after resolution
Code of Conduct (CODE_OF_CONDUCT.md)
A Code of Conduct creates a safe and welcoming community.
Example CODE_OF_CONDUCT.md
# Code of Conduct ## Our Pledge We are committed to providing a respectful and inclusive environment for everyone. ## Expected Behavior - Be respectful and considerate - Use welcoming and inclusive language - Accept constructive criticism - Focus on what is best for the community ## Unacceptable Behavior - Harassment or discrimination - Trolling or personal attacks - Publishing private information ## Enforcement Project maintainers are responsible for enforcing this code of conduct. ## Reporting Report incidents to: conduct@project-domain.example
Recommendation:
- Use the Contributor Covenant as a base
- Enforce consistently and transparently
Release Automation (semantic-release)
Automated releases reduce human error and ensure consistency.
What semantic-release Does
- Determines next version from commit messages
- Generates changelog entries
- Creates Git tags and releases
- Publishes artifacts automatically
Commit Requirements
semantic-release requires Conventional Commits:
- feat: introduces a new feature (MINOR)
- fix: bug fix (PATCH)
- feat!: or BREAKING CHANGE (MAJOR)
Example semantic-release Configuration
{
"branches": ["main"],
"plugins": [
"@semantic-release/commit-analyzer",
"@semantic-release/release-notes-generator",
"@semantic-release/changelog",
"@semantic-release/github"
]
}
GitHub Actions: Automated Release
name: Release
on:
push:
branches:
- main
jobs:
release:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "20"
- run: npm ci
- run: npx semantic-release
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Python + semantic-release Notes
- semantic-release manages versions and tags
- Python packages should:
- Read version from git tags
- Or inject version during build (setuptools_scm)
Open Source Release Best Practices
- Use automated releases
- Never manually edit versions
- Always release from main branch
- Keep CHANGELOG.md generated automatically
- Tag every release
Final Open Source Readiness Checklist
- README.md
- CONTRIBUTING.md
- CHANGELOG.md
- LICENSE
- CODE_OF_CONDUCT.md
- SECURITY.md
- Issue templates
- Pull Request templates
- CI pipelines per service
- Automated releases enabled
Advanced Open Source Project Setup
This section completes the open source framework with publishing automation, governance, labeling standards, and repository structure.
Automated Package Publishing
Automated publishing ensures consistent, repeatable releases.
PyPI Publishing (Python)
Best practice:
- Publish only from tagged releases
- Use CI for trusted publishing
GitHub Actions: Publish to PyPI
name: Publish Python Package
on:
release:
types: [published]
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install build
- run: python -m build
- uses: pypa/gh-action-pypi-publish@release/v1
Requirements:
- pyproject.toml configured
- Trusted Publishing enabled in PyPI
npm Publishing (JavaScript)
Best practice:
- Use semantic-release
- Publish only from main branch
GitHub Actions: Publish to npm
name: Publish npm Package
on:
push:
branches:
- main
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "20"
registry-url: https://registry.npmjs.org
- run: npm ci
- run: npx semantic-release
env:
NPM_TOKEN: ${{ secrets.NPM_TOKEN }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GitHub Labels Taxonomy
A consistent label system improves triage and contributor onboarding.
Type Labels
- bug
- enhancement
- documentation
- refactor
- security
- question
Priority Labels
- priority: critical
- priority: high
- priority: medium
- priority: low
Status Labels
- status: triage
- status: blocked
- status: in progress
- status: ready for review
Scope / Stack Labels
- python
- javascript
- frontend
- backend
- api
- ci
Community Labels
- good first issue
- help wanted
- breaking change
Maintainers and Governance Model
Clear governance improves trust and sustainability.
Roles
- Maintainers
- Own project direction
- Review and merge PRs
- Manage releases
- Contributors
- Submit issues and PRs
- Improve code and documentation
Decision Making
- Decisions are made publicly in issues or PRs
- Maintainers aim for consensus
- Maintainer vote is final when consensus cannot be reached
Becoming a Maintainer
- Consistent high-quality contributions
- Community engagement
- Invitation by existing maintainers
Governance File
Recommended file:
- GOVERNANCE.md
Complete Open Source Starter Repository Structure
Recommended structure for a Python + JavaScript open source project:
project-root/
├── backend/
│ ├── app/
│ │ ├── __init__.py
│ │ ├── main.py
│ │ ├── api/
│ │ └── services/
│ ├── tests/
│ ├── pyproject.toml
│ └── README.md
│
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ ├── pages/
│ │ └── services/
│ ├── tests/
│ ├── package.json
│ └── README.md
│
├── docs/
│ ├── api/
│ ├── guides/
│ └── README.md
│
├── .github/
│ ├── workflows/
│ │ ├── backend-ci.yml
│ │ ├── frontend-ci.yml
│ │ └── release.yml
│ ├── ISSUE_TEMPLATE/
│ └── PULL_REQUEST_TEMPLATE.md
│
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── GOVERNANCE.md
├── LICENSE
├── README.md
├── SECURITY.md
└── semantic-release.json
Open Source Maturity Checklist
- Automated CI per service
- Automated releases
- PyPI and npm publishing
- Clear contribution workflow
- Governance defined
- Labels and templates configured
- Security policy documented
- Code of conduct enforced
Final Notes
Well-maintained open source projects prioritize:
- Automation over manual work
- Transparency over private decisions
- Documentation over tribal knowledge
- Community over individual ownership
Concerns
Failure modes that hurt trust, adoption, and scientific credibility of open-source research software
- Unmaintained dependencies leading to security vulnerabilities
- Hardcoded credentials or exposed configuration secrets
- Outdated documentation or broken installation procedures
- Inactive issue trackers or unresolved pull requests
- Lack of clarity regarding maintenance status
Avoiding Those Common Failures in OSS
Frequent risks in open source projects and how to prevent them through policy, process, and tooling.
1 — Unmaintained Dependencies and Security Vulnerabilities
Risk:
- Dependencies become unmaintained or insecure
- Transitive dependencies introduce vulnerabilities
- Security risks propagate silently
How to avoid:
- Pin dependency versions (requirements.txt, package-lock.json)
- Use automated dependency scanning tools
- Dependabot
- Renovate
- Monitor security advisories (CVE databases)
- Remove unused dependencies regularly
- Prefer well-maintained, widely used libraries
Institutional practice:
- Assign dependency ownership
- Schedule periodic dependency reviews
2 — Hardcoded Credentials and Exposed Secrets
Risk:
- API keys or passwords committed to repositories
- Accidental leaks via configuration files
- Permanent exposure due to Git history
How to avoid:
- Never store secrets in source code
- Use environment variables for configuration
- Add secret patterns to .gitignore
- Use automated secret scanning tools
- GitHub Secret Scanning
- TruffleHog
- Rotate secrets immediately if exposed
Institutional practice:
- Define a secrets management policy
- Educate researchers on secure configuration
3 — Outdated Documentation and Broken Installation
Risk:
- Users cannot install or run the software
- Research results are not reproducible
- Loss of user trust
How to avoid:
- Treat documentation as part of the release
- Test installation steps in CI
- Keep a minimal “Quick Start” section
- Archive deprecated instructions clearly
- Assign documentation ownership
Institutional practice:
- Require documentation updates for every release
- Include docs review in PR process
4 — Inactive Issues and Unresolved Pull Requests
Risk:
- Contributors feel ignored
- Community engagement declines
- Project appears abandoned
How to avoid:
- Define response-time expectations
- Use labels: triage, help wanted, blocked
- Close stale issues transparently
- Acknowledge all contributions, even if rejected
- Use automation for stale issue management
Institutional practice:
- Allocate time for issue triage
- Track maintainer workload explicitly
5 — Lack of Clarity About Maintenance Status
Risk:
- Users do not know if the software is reliable
- Unclear expectations lead to frustration
- Hidden abandonment damages institutional credibility
How to avoid:
- Explicitly state maintenance status in README
- Use standard lifecycle labels:
- Active
- Maintenance
- Deprecated
- Archived
- Document support scope and response expectations
- Provide end-of-life notices when applicable
Institutional practice:
- Require lifecycle statements for all public repositories
- Archive inactive repositories explicitly
6 — Maintenance Transparency Best Practices
Recommended signals:
- Last release date
- CI status badge
- Maintainer contact or team
- Roadmap or milestones
- CONTRIBUTING.md and GOVERNANCE.md
Transparency builds trust, even with limited resources.
7 — Key Takeaways
- Dependency hygiene is a security requirement
- Secrets management is non-negotiable
- Documentation enables reproducibility
- Community engagement requires active processes
- Maintenance status must be explicit