Research Data Management Plan - Theoretical Chemistry Group
1. Data Summary
Types of data generated
- Input/output files from quantum chemistry software (e.g., Q-Chem, VASP, Orca, Quantum Espresso)
- Custom simulation scripts (Python, Bash, etc.)
- Molecular structure files (e.g., `.xyz`, `.mol`, `.pdb`)
- Raw and processed data from computational experiments (e.g., energy profiles, optimization paths)
- Figures, plots, and analytical summaries
- Lab notebooks (electronic, e.g., Jupyter Notebooks)
Purpose of data collection
To study the electronic structure and properties of molecular systems, develop or test computational methods, and generate reproducible theoretical predictions.
Reusability of data
Most datasets will be reusable, especially input files and benchmark results. Metadata and documentation will be provided to facilitate reuse.
2. FAIR Data Principles
Findable
- Datasets will be indexed and deposited in searchable repositories (e.g., Zenodo).
- Metadata will follow standards such as the Chemical Markup Language (CML) and include project title, molecule identifiers (InChI/SMILES), methods used, etc.
Accessible
- Public datasets will be deposited with open access licensing (e.g., CC-BY 4.0).
- Controlled-access options will be used if data includes embargoed or unpublished results.
Interoperable
- Data will be saved in open formats (e.g., `.xyz`, `.json`, `.csv`, `.log`) when possible.
- Standard naming conventions and ontologies (e.g., IUPAC, CHEMINF) will be used.
Reusable
- All data will include documentation on generation methods, software versioning, and computational details.
- Code and scripts will include inline documentation and READMEs.
3. Data Storage and Backup
Active data storage
- University-managed servers or secure cloud storage (e.g., Leibniz Rechenzentrum HPC clusters with automated backups)
- Version control using Git (e.g., GitHub, GitLab)
Backup policy
- Daily backups on institutional infrastructure
- Weekly redundant offsite backup snapshots
Security
- Access-controlled environments
- Two-factor authentication for remote access
4. Data Sharing and Publication
- Data will be made available upon publication in peer-reviewed journals.
- Datasets will be uploaded to domain-specific repositories:
- Quantum chemistry inputs/outputs: MolSSI QCArchive
- General research data: Zenodo, Figshare
- Persistent identifiers (DOIs) will be assigned.
- Scripts and notebooks will be published via GitHub with Zenodo linkage for DOI minting.
5. Roles and Responsibilities
Principal Investigator: Overall responsibility for data strategy, integrity, and compliance with funding agency mandates.
Postdocs and PhD students: Responsible for maintaining clean and reproducible project folders, backups, and documentation.
6. Ethics and Legal Compliance
- No human or sensitive data involved.
- Software licenses (e.g., Q-Chem, VASP) will be used according to institutional agreements.
- Independent generated code will be released under open-source licenses (e.g., MIT, GPL) where possible.
7. Data Retention and Archiving
- Raw and processed data will be stored for at least 10 years post-publication.
- Archived datasets will be stored in non-proprietary formats with full documentation.
- Periodic review to migrate data to current formats to avoid obsolescence.
8. Resources and Costs
- Storage and backup resources will be provided by the university or national HPC services.
- Data publication costs (e.g., for open repositories) will be included in project budgeting or are covered by contracts of the university with publishers.
- No additional staff can currently be hired solely for data management of our research group.