Research data collected without a documented methodology cannot be reproduced, defended, or published. Data that cannot be reproduced is not evidence — it is anecdote with a laboratory attachment.
The reproducibility crisis in research — the finding that a significant proportion of published studies cannot be replicated — is substantially a data collection quality problem. Studies that used undocumented or inconsistently applied data collection protocols, instruments that were not validated, samples that were not representative, or data that was selectively recorded produce results that look like findings but are not reproducible by independent researchers. In corporate R&D, the consequences are different but no less severe: regulatory submissions rejected for insufficient data integrity, clinical trials invalidated by protocol deviations, and product development decisions made on unreliable data. A structured R&D data collection process addresses all of this from the planning stage: defining what data is needed and why before collection begins, validating collection instruments, documenting every collection condition and deviation, applying GLP principles where applicable, and storing data in alignment with FAIR principles so it remains findable, accessible, interoperable, and reusable throughout the research lifecycle. This free checklist gives research teams, R&D managers, and data scientists a structured framework for the full data collection lifecycle.
FAIR Data — the International Standard for Research Data Management
F
Findable
Data and metadata are assigned a globally unique and persistent identifier. Data is described with rich metadata and registered in a searchable resource. Metadata clearly and explicitly includes the identifier of the data it describes.
A
Accessible
Data is retrievable by its identifier using a standardised, open, free, and universally implementable protocol. Metadata is accessible even when the data itself is not (for example, for data that is confidential or commercially sensitive).
I
Interoperable
Data uses a formal, accessible, shared, and broadly applicable language for knowledge representation. Vocabulary follows FAIR principles. Data includes qualified references to other data and uses community standards.
R
Reusable
Data is richly described with accurate and relevant attributes, released with a clear data usage licence, associated with detailed provenance, and meets domain-relevant community standards for completeness and quality.
FAIR data principles are now required by the EU’s Horizon Europe programme (the world’s largest R&D funding programme at €95.5 billion), by the UK Research Councils, NIH, and most major global funders. Data management plans (DMPs) that describe how FAIR principles will be met are typically required at proposal stage.
The R&D Data Collection Process Checklist
Six phases covering the full data collection lifecycle — from planning and instrument development through execution, validation, and FAIR-compliant storage.
Phase 1
Data Collection Plan & Methodology
The most common data collection failure mode is collecting data without knowing exactly what question it is intended to answer. Every data element collected should be traceable to a specific research question or analytical need.
Define the research questions to be answered — each data type to be collected mapped to the specific research question it will address
Select the data collection methodology — qualitative (interviews, focus groups, observation), quantitative (surveys, experiments, sensor data), or mixed methods; justified by the research question
Define the sampling strategy — random, stratified, purposive, or convenience? What sample size is required for adequate statistical power?
Develop the data collection instruments — survey questionnaire, interview guide, observation protocol, measurement procedure, or data extraction form
Prepare the data management plan (DMP) — how data will be stored, backed up, shared, and archived; required by most major funders
Obtain ethics approval before any human subjects data collection begins — ethics approval is a prerequisite, not a parallel activity
Phase 2
Instrument Development & Validation
Develop the data collection instrument — clear, unambiguous items; pilot-tested before full deployment
Validate the instrument — face validity (expert review), content validity (covers the domain), and pilot testing with a small representative sample
Calibrate measurement equipment — all instruments calibrated against traceable standards before use; calibration records maintained
Train data collectors — everyone follows the same protocol; inter-rater reliability tested for qualitative or coded data
Conduct a pilot study — small-scale trial of the full data collection protocol; identifies procedural problems before the main study; instrument refined if needed
Phase 3
Recruitment & Sampling (Where Applicable)
Define inclusion/exclusion criteria — precisely; for human subjects: consistent with ethics approval; for laboratory samples: consistent with the experimental protocol
Recruit participants or prepare samples — per the sampling strategy; informed consent obtained before data collection for all human subjects
Document the recruitment process — who was recruited, when, how; any deviations from the protocol recorded
Confirm sample representativeness — is the recruited sample representative of the target population? Any selection bias acknowledged and documented
Phase 4
Data Collection Execution
Every deviation from the pre-specified data collection protocol must be documented at the time of occurrence — not reconstructed retrospectively. The GLP principle of contemporaneous documentation is the most important single practice in laboratory data collection.
Follow the protocol precisely — no mid-collection protocol changes without prior ethics review (for human subjects) or documented deviation (for non-clinical studies)
Document every collection event — date, time, location, collector identity, conditions (temperature, equipment settings), participant or sample identifier
Record raw data contemporaneously — at the time of collection; not from memory; original and unaltered
Document all deviations — any departure from the protocol noted immediately; signed and dated; reason explained; impact assessed
Label all samples and data files — with unique identifiers traceable to the collection record; no orphan data
Phase 5
Data Validation & Quality Control
Check for completeness — is all planned data collected? Any missing data points documented with the reason (participant withdrawal, instrument failure, excluded by protocol)
Check for data entry errors — double-entry or independent verification for critical data
Apply range and consistency checks — values outside plausible ranges identified and investigated; not automatically deleted
Handle outliers transparently — per a pre-specified outlier policy; not post-hoc exclusion of inconvenient data
Lock the dataset — after validation; the locked dataset is the analytical dataset; subsequent changes are versioned and documented
Phase 6
Data Storage, Security & FAIR Compliance
Store data securely — on institutional or approved storage systems; minimum 3-2-1 backup rule (3 copies, 2 different media, 1 offsite); encrypted for personal data
Assign unique persistent identifiers to datasets — (e.g. DOI via a data repository); supporting FAIR Findability
Document data provenance metadata — who collected, when, how, with what instrument, under what conditions; supporting FAIR Reusability
Apply an appropriate access licence — open access where permitted; restricted access for personal or commercially sensitive data with clear conditions
Define the data retention period — per funder requirements (typically minimum 5–10 years post-publication), institutional policy, and regulatory requirements
This checklist is available as a free, runnable template in CheckFlow — with deviation documentation tasks built into the execution phase and a complete data collection record created as the process runs.
Good Laboratory Practice — the Data Integrity Standard for Regulated Research
Good Laboratory Practice (GLP) is a quality management framework for planning, performing, monitoring, recording, archiving, and reporting non-clinical health and environmental safety studies. Developed by the OECD, GLP compliance is required by regulatory agencies including the FDA and EMA for safety data submitted in support of marketing applications for pharmaceuticals, pesticides, chemicals, and other regulated products. The core GLP principles that are most relevant to any data collection process are: a study plan (protocol) defined and approved before the study starts; all raw data recorded contemporaneously and signed; all instruments calibrated and maintenance documented; every deviation from the protocol documented at the time of occurrence; all study data and records archived in secure, retrievable storage for the required period; and a Quality Assurance Unit independent from the study team responsible for monitoring GLP compliance throughout the study.
Even in non-regulated research, the GLP principle of contemporaneous documentation — writing down what was done and observed at the time it happened, not from memory hours or days later — is the most important single practice for data integrity. Retrospectively reconstructed records are not original records, and in the event of a dispute about the data, the contemporaneous record prevails.
Why Run R&D Data Collection in CheckFlow?
1
Data collection executed consistently across all collectors
Research data collected by multiple team members using slightly different approaches produces systematic variation indistinguishable from the phenomenon being studied. CheckFlow’s data collection protocol presents every collector with the same structured steps, in the same order, with the same documentation requirements — regardless of who is collecting on any given day.
Every deviation captured at the time of occurrence
The deviation noted in the analysis report six months after collection and reconstructed from memory is the deviation that reviewers question. CheckFlow’s execution phase includes a deviation documentation task as a required step in the collection workflow — the contemporaneous record that GLP requires and that peer review demands.
A data collection record that supports FAIR compliance
Major funders require data management plans and FAIR-compliant data handling. CheckFlow creates a structured data collection record — who collected what, when, with what instrument, under what conditions — that serves as the provenance metadata supporting Reusability, the most demanding of the FAIR principles.
R&D data collection plans require ethics approval before human subjects data collection can begin. CheckFlow’s R&D Ethics Compliance Checklist covers the ethics review and IRB submission process. See the R&D Ethics Compliance Checklist →
Data collection plans are a core component of research proposals. CheckFlow’s Research Proposal Submission Checklist covers the full proposal development process including the methodology section. See the Research Proposal Submission Checklist →
What should an R&D data collection process checklist include?
+
An R&D data collection process checklist covers six phases: data collection planning (research question mapping, methodology selection, sampling strategy, instrument development, data management plan, ethics approval confirmation), instrument development and validation (validation, equipment calibration, collector training, pilot study), participant/sample recruitment (inclusion/exclusion criteria, informed consent, recruitment documentation), data collection execution (protocol adherence, contemporaneous documentation, raw data recording, deviation documentation, sample labelling), data validation and quality control (completeness check, data entry verification, range checks, outlier policy, dataset locking), and data storage and FAIR compliance (secure storage with backup, persistent identifiers, provenance metadata, access licence, retention period).
What are FAIR data principles and why do they matter?
+
FAIR stands for Findable, Accessible, Interoperable, and Reusable — principles for research data management published in Scientific Data in 2016 and since adopted as a standard by major research funders worldwide. Findable means data is described with sufficient metadata and assigned a persistent unique identifier so it can be located. Accessible means data and metadata can be retrieved using open, standardised protocols. Interoperable means data uses standard vocabularies and formats that allow it to be combined with other datasets. Reusable means data is richly described with provenance and a clear usage licence so others can understand, validate, and build upon it. EU Horizon Europe, the UK Research Councils, the NIH, and most major research funders now require FAIR data management plans.
What is Good Laboratory Practice (GLP)?
+
Good Laboratory Practice (GLP) is a quality system framework for planning, performing, monitoring, recording, archiving, and reporting non-clinical health and environmental safety studies. Developed by the OECD, GLP is required by regulatory agencies including the FDA, EMA, and their international equivalents for safety data submitted in support of marketing applications for pharmaceuticals, pesticides, chemicals, and other regulated products. Core GLP principles relevant to any laboratory include: study plan (protocol) defined before study start; raw data recorded contemporaneously; all deviations documented; data traceability to instruments and personnel; and archiving of raw data and study documentation.
Why is a pilot study important in R&D data collection?
+
A pilot study is a small-scale trial of the full data collection protocol before the main study begins. Its purposes are: to identify and fix procedural problems that are not apparent when designing the protocol on paper (ambiguous instructions, equipment issues, participant comprehension problems), to validate the instruments against the intended measurement, to assess the feasibility of the data collection timeline, and to provide data for refining sample size calculations. Discoveries made in a pilot study are cheap; the same discoveries made mid-way through the main study are expensive.
Is CheckFlow free for this template?
+
You can start a free 14-day trial with no credit card required, giving you full access to all features including this template. The Business plan is $10 per user per month after the trial. Full details at checkflow.io/pricing.
Collect Research Data That Is Reproducible, Defensible, and FAIR
Free trial — no credit card required.
Do you like cookies? 🍪 We use cookies to ensure you get the best experience on our website. Learn more