About Us

The Yale Data Coordinating Center (YDCC) is a partnership between the YCAS, Emergency Medicine, the Yale Program on Aging (POA) and the Yale Center for Medical Informatics (YCMI). It is composed of faculty from the School of Medicine and School of Public Health with expertise in biostatistics, epidemiology, clinical trials and informatics along with a highly trained technical staff skilled in systems programming, data management, data analysis and statistical programming. Our group has experience designing and conducting single and multicenter clinical trials, longitudinal cohort studies, case control studies and other observational studies.

The YDCC offers comprehensive coordination activities including:

Hypothesis development and definition of study objectives

The transformation of research questions into statistically testable hypotheses is an important first step, providing guidance for the design of the study.  Our group is experienced in working with investigators to translate their scientific questions into clearly stated aims with hypotheses that can be measured and tested.

Selection of study outcome measures

The outcome measure is an event or change measure that is the target for treatment and addresses the study objective. Outcome measures should be clinically relevant, easy to observe and reliable. The selection of an outcome often requires a balance between cost and precision.  Our researchers are skilled in summarizing and weighing the advantages and disadvantages of various outcomes.

Determination of the assessment schedule

While selection of the primary, secondary and exploratory outcomes is an essential step, when and how often these outcomes are assessed is just as critical. Our experts collaborate with investigators to determine the most efficient and effective assessment schedule (i.e. optimal number and timing of outcome assessments) weighing the statistical efficiency against practical (e.g. participant burden) and economic limits.

Determination of the statistical design

Trained and experienced YDCC researchers collaborate with investigators to determine the optimal statistical design, including standard designs (e.g., parallel group, factorial, cluster, crossover, noninferiority, two-stage)  as well as newer types of designs such as SMART and MOST. Our researchers provide sample size/power estimation for a variety of designs, outcomes and hypotheses; develop randomization plans and treatment allocation procedures; outline procedures for bias/variance control; and create the statistical analysis plans (SAP), including methods for interim monitoring of the data and reporting to Data and Safety Monitoring Boards (DSMBs). Our faculty and staff are experienced with several free (e.g. R) and commercially available  (e.g. PASS, SAS) software packages for the estimation of sample size and are also equipped to perform simulations. We have conducted sample size/power estimation for:

  • Continuous/Discrete/Categorical/Censored Outcomes
  • Superiority/Non-Inferiority/Equivalence Hypotheses
  • Parallel Group/Crossover/Factorial/Cluster Randomized Trials
  • Case-Control/Matched Case Control/Cohort studies
  • Hierarchichal and Repeated Measures Studies
  • Fixed/Sequential Designs

Development of case report forms (CRF)

Well-designed case report forms are critical to the conduct  of a study, with typical studies utilizing dozens of forms.  As such they require the development by personnel with experience in form construction and familiarity with methods for data collection and processing. Our data management experts understand that appropriate form layout and item construction can help to avoid confusion from study personnel and lead to more accurate and complete data collection.   

Design, implementation and maintenance of clinical data management systems

The YDCC offers several solutions for trial management, data collection and storage including local and web-based systems for centralized or remote entry (e.g. entry by study participants). Our programmers also offer custom solutions to transfer data from existing ancillary sources. We provide training of study personnel and tech support throughout the trial for the data management system.

Data entry

Our data entry specialists are trained in the data collection process specific to each study and follow detailed protocols to avoid entry errors and identify data errors.  

Data quality control

Data intake systems are designed to encourage timely and continuous data flow for quality control. Our data management experts work with investigators to develop a tailored approach to quality control that may include automated system edit checks at data entry, standard and custom reports for data edit checks and a process for edit queries, resolution and tracking through audit trails.

Data security

The YDCC offers investigators a secure environment to store their data with general precautions and safeguards to protect against loss and unauthorized use. Systems meet requirements of the Code of Federal Regulations (CFR), 21 Part 11.  

Our biostatisticians work with data monitoring committees to produce a tailored set of analyses, tables, listings, figures and timelines required for efficient monitoring of the safety and conduct of the study. Reports may include summarizations of accrual, screening failures, participant disposition, baseline characteristics, adverse events, abnormal labs, study treatment compliance, provider treatment fidelity, completeness of data collection, interim efficacy and futility analyses.
The choice of the analytic methods depends on the statistical hypothesis (superiority, non-inferiority, equivalence), the type of outcome (continuous, discrete, categorical, censored), the structure of the data (e.g. repeated measures, clustering), elements of the design (e.g. crossover, factorial, stratification, matching) and the presence of missing data. YDCC biostatisticians are experienced in state of the art analytic methods including survival analysis (Kaplan-Meier, Cox regression, parametric survival analysis, survival analysis with recurrent events, competing risks analysis, frailty models), methods for repeated measures and hierarchical data (GEE, mixed models, latent growth curve), structural equation modeling with latent variables, analysis with missing data (exploration of missing data patterns, sensitivity analysis), marginal structural models, longitudinal analysis. We have an extensive array of software for conducting state of the art analyses and trial designs, including SAS, STATA, R, EAST. We are also highly experienced in preparing manuscripts for publication that summarize trial findings. Our expertise includes writing the statistical methods section, preparing tables and high quality figures, interpretation of the results and editing manuscripts for statistical content.


James Dziura, MPH, PhD

Yale Center for Analytical Sciences 

Yale School of Public Health 

300 George Street Suite 555 

New Haven, CT 06510