| Title: | Conditional Logistic Regression |
|---|---|
| Description: | Performs inference for Bayesian conditional logistic regression with informative priors built from the concordant pair data. We include many options to build the priors. And we include many options during the inference step for estimation, testing and confidence set creation. For details, see Kapelner and Tennenbaum (2026) "Improved Conditional Logistic Regression using Information in Concordant Pairs with Software" <doi:10.48550/arXiv.2602.08212>. |
| Authors: | Adam Kapelner [aut, cre]
|
| Maintainer: | Adam Kapelner <[email protected]> |
| License: | GPL-3 |
| Version: | 1.1.1 |
| Built: | 2026-05-24 05:59:30 UTC |
| Source: | https://github.com/tennenbaumj/bclogit_package_and_paper_repo |
bclogit
Fits a conditional logistic regression model to incidence data. Allows for use of the concordant pairs in the fitting.
Jacob Tennenbaum \email{[email protected]}
Adam Kapelner \email{[email protected]}
Jacob Tennenbaum and Adam Kapelner (2026). "Improved Conditional Logistic Regression using Information in Concordant Pairs with Software." arXiv preprint arXiv:2602.08212.
Useful links:
https://github.com/Tennenbaum-J/bclogit_package_and_paper_repo
Report bugs at https://github.com/Tennenbaum-J/bclogit_package_and_paper_repo/issues
This function fits a Bayesian conditional logistic regression model, incorporating information from concordant pairs to improve estimation.
## S3 method for class 'formula' bclogit( formula, data, treatment = NULL, strata = NULL, subset = NULL, na.action = NULL, concordant_method = "GLM", prior_type = "Naive", chains = 4, return_raw_stan_output = FALSE, prior_variance_treatment = 100, stan_refresh = 0, ... ) bclogit( formula, data, treatment = NULL, strata = NULL, subset = NULL, na.action = NULL, concordant_method = "GLM", prior_type = "Naive", chains = 4, return_raw_stan_output = FALSE, prior_variance_treatment = 100, stan_refresh = 0, ... ) ## Default S3 method: bclogit( formula = NULL, data = NULL, treatment = NULL, strata = NULL, subset = NULL, na.action = NULL, concordant_method = "GLM", prior_type = "Naive", chains = 4, return_raw_stan_output = FALSE, prior_variance_treatment = 100, stan_refresh = 0, ..., y = NULL, X = NULL, treatment_name = NULL, call = NULL )## S3 method for class 'formula' bclogit( formula, data, treatment = NULL, strata = NULL, subset = NULL, na.action = NULL, concordant_method = "GLM", prior_type = "Naive", chains = 4, return_raw_stan_output = FALSE, prior_variance_treatment = 100, stan_refresh = 0, ... ) bclogit( formula, data, treatment = NULL, strata = NULL, subset = NULL, na.action = NULL, concordant_method = "GLM", prior_type = "Naive", chains = 4, return_raw_stan_output = FALSE, prior_variance_treatment = 100, stan_refresh = 0, ... ) ## Default S3 method: bclogit( formula = NULL, data = NULL, treatment = NULL, strata = NULL, subset = NULL, na.action = NULL, concordant_method = "GLM", prior_type = "Naive", chains = 4, return_raw_stan_output = FALSE, prior_variance_treatment = 100, stan_refresh = 0, ..., y = NULL, X = NULL, treatment_name = NULL, call = NULL )
formula |
For the formula method, a symbolic description of the model to be fitted. |
data |
A data.frame, data.table, or model.matrix containing the variables (optional for formula method). |
treatment |
Optional vector specifying the treatment variable (required for default method, or can be specified in formula method). |
strata |
Vector specifying the strata (matched pairs). |
subset |
An optional vector specifying a subset of observations. |
na.action |
A function which indicates what should happen when the data contain NAs. |
concordant_method |
The method to use for fitting the concordant pairs and reservoir. Options are "GLM", "GEE", and "GLMM". |
prior_type |
The type of prior to use for the discordant pairs. Options are "Naive", "G prior", "PMP", and "Hybrid". |
chains |
Number of chains for Stan sampling. Default is 4. |
return_raw_stan_output |
Logical; if |
prior_variance_treatment |
Prior variance for the treatment coefficient in the
covariance matrix |
stan_refresh |
How often Stan reports sampling progress (in iterations). Default is 0 (silent). Set to a positive integer (e.g., 1 or 100) to see progress. |
... |
Additional arguments passed to |
y |
For the default method, a binary (0,1) vector containing the response of each subject. |
X |
A data.frame, data.table, or model.matrix containing the variables. |
treatment_name |
Optional string name for the treatment variable. |
call |
Optional call object to store in the result. |
An object of class "bclogit".
A list of class bclogit containing:
coefficients |
Estimated coefficients (posterior means). |
var |
Variance-covariance matrix of coefficients. |
model |
The fitted Stan model object. |
posterior_samples |
Raw posterior samples as a 3D array (iterations x chains x parameters) from |
concordant_model |
The fitted model object for the concordant pairs/reservoir (GLM/GEE/GLMM). |
matched_data |
The processed matched pairs data from the premodeling step. |
prior_info |
A list with elements |
call |
The function call. |
terms |
The model terms. |
num_discordant |
Number of discordant pairs used. |
num_concordant |
Number of concordant pairs/reservoir entries used. |
A list of class "bclogit" containing:
coefficients |
Estimated coefficients (posterior means). |
var |
Variance-covariance matrix of the coefficients (posterior covariance). |
model |
The fitted Stan model object for the discordant pairs. |
posterior_samples |
Raw posterior samples as a 3D array
(iterations x chains x parameters) from |
concordant_model |
The fitted model object for the concordant pairs (GLM, GEE, or GLMM). |
matched_data |
The processed matched pairs data from the C++ pre-modeling step. |
prior_info |
A list with elements |
call |
The function call. |
terms |
The model terms. |
xlevels |
Factor level information (always |
n |
Total number of observations. |
num_discordant |
Number of discordant pairs used for fitting. |
num_concordant |
Number of concordant pairs used for the prior. |
X_model_matrix_col_names |
Column names of the covariate model matrix. |
treatment_name |
Name of the treatment variable. |
bclogit(formula): Formula method
bclogit(default): Default method for matrix/data input.
When concordant_method = "GEE", the function applies a layered set of
safeguards to ensure the prior covariance Sigma_con passed to Stan is
valid (positive definite, symmetric, and free of NA values).
GEE's default covariance is the sandwich (robust) estimator, which can
produce degenerate (e.g. negative) variances when the number of independent
clusters is small. The code instead uses geese$vbeta.naiv, the
model-based covariance that assumes the working correlation structure is
exactly correct. It is less efficient asymptotically but always
well-structured for small samples.
After extracting the covariance, a Cholesky decomposition is attempted
via tryCatch(chol(Sigma_con), ...). This is the canonical test for
positive definiteness: it succeeds if and only if the matrix is PD. A
failed decomposition sets an is_pd = FALSE flag without crashing
the call.
If the GEE naive covariance or the GLMM fixed-effect vcov is still
not positive definite (GLMM failures are common with boundary variance
estimates or convergence issues), a standard
glm(..., family = binomial) is re-fitted on the same concordant
data and its Fisher-information-based vcov is used instead. That
matrix is always PD for a non-degenerate design matrix, providing a
graceful degradation from GEE/GLMM to GLM.
Sigma_con <- (Sigma_con + t(Sigma_con)) / 2 is applied after
the GLM fallback re-extraction and again unconditionally before the matrix
is passed to Stan. Name-indexed covariance matching can introduce tiny
floating-point asymmetries; averaging with the transpose enforces exact
symmetry as required by Stan's Cholesky-based samplers.
Aliased model terms (collinear columns in the concordant data) cause
coef() and vcov() to return NA. These are caught
before Stan sees them: NA entries in the prior mean vector default
to 0 (a neutral prior), and any NA in the covariance matrix
triggers replacement of the entire matrix with a wide diagonal
(diag(100, p)), an uninformative but valid prior.
Multiple failure paths (too few concordant pairs, persistent non-PD
matrix, NA-contaminated covariance) fall back to
diag(100, p), which is positive definite by construction and
encodes a broad, independent prior that lets the discordant-pair
likelihood dominate.
After extracting the concordant-model covariance, the entire first row
and column (corresponding to the treatment coefficient) are zeroed and
the diagonal entry is set to prior_variance_treatment (default
100). This decouples the treatment prior from the nuisance-covariate
prior: concordant pairs inform covariate shrinkage but are not allowed to
shrink the treatment coefficient, which remains independently diffuse.
Zeroing the off-diagonals also eliminates any spurious prior correlation
between treatment and covariates induced by the concordant-model fit.
summary.bclogit, confint.bclogit,
vcov.bclogit, coef.bclogit
# Example usage data("fhs") fit <- bclogit(PREVHYP ~ TOTCHOL + CIGPDAY + BMI + HEARTRTE, data = fhs, treatment = PERIOD, strata = RANDID) summary(fit)# Example usage data("fhs") fit <- bclogit(PREVHYP ~ TOTCHOL + CIGPDAY + BMI + HEARTRTE, data = fhs, treatment = PERIOD, strata = RANDID) summary(fit)
Fits a conditional logistic regression model for matched pairs using
the discordant-pair GLM trick. This is a fast frequentist alternative
to bclogit.
## Default S3 method: clogit( formula = NULL, data = NULL, treatment = NULL, strata = NULL, subset = NULL, na.action = NULL, do_inference_on_var = "all", ..., y = NULL, X = NULL, treatment_name = NULL, call = NULL ) ## S3 method for class 'formula' clogit( formula, data, treatment = NULL, strata = NULL, subset = NULL, na.action = NULL, do_inference_on_var = "all", ... ) clogit( formula, data, treatment = NULL, strata = NULL, subset = NULL, na.action = NULL, do_inference_on_var = "all", ... )## Default S3 method: clogit( formula = NULL, data = NULL, treatment = NULL, strata = NULL, subset = NULL, na.action = NULL, do_inference_on_var = "all", ..., y = NULL, X = NULL, treatment_name = NULL, call = NULL ) ## S3 method for class 'formula' clogit( formula, data, treatment = NULL, strata = NULL, subset = NULL, na.action = NULL, do_inference_on_var = "all", ... ) clogit( formula, data, treatment = NULL, strata = NULL, subset = NULL, na.action = NULL, do_inference_on_var = "all", ... )
formula |
For the formula method, a symbolic description of the model. |
data |
A data frame containing the variables (for formula method). |
treatment |
Vector specifying the treatment variable. |
strata |
Vector specifying the strata (matched pairs). |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
A function which indicates what should happen when the data contain NAs. |
do_inference_on_var |
Which variable(s) to compute standard errors for.
|
... |
Additional arguments passed to
|
y |
For the default method, a binary (0,1) response vector. |
X |
A data.frame, data.table, or model.matrix containing the variables. |
treatment_name |
Optional string name for the treatment variable. |
call |
Optional call object to store in the result. |
A list of class "clogit_bclogit" containing:
coefficients |
Estimated coefficients (posterior means). |
var |
Variance-covariance matrix of the coefficients (diagonal, built from standard errors).
|
flr_model |
The fitted fast logistic regression model object returned by
|
call |
The function call. |
terms |
The model terms. |
n |
Total number of observations. |
num_discordant |
Number of discordant pairs used for fitting. |
num_concordant |
Number of concordant pairs. |
X_model_matrix_col_names |
Column names of the covariate model matrix. |
treatment_name |
Name of the treatment variable. |
se |
Standard errors of the coefficients. |
z |
Z-statistics for each coefficient. |
pval |
Approximate p-values for each coefficient. |
do_inference_on_var |
The value of the |
An object of class "clogit_bclogit".
An object of class "clogit_bclogit".
clogit(default): Default method for matrix/data input.
clogit(formula): Formula method
bclogit, summary.clogit_bclogit,
coef.clogit_bclogit, vcov.clogit_bclogit
data("fhs") fit <- clogit(PREVHYP ~ TOTCHOL + CIGPDAY + BMI + HEARTRTE, data = fhs, treatment = PERIOD, strata = RANDID) summary(fit) n <- 200 dat <- data.frame( y = rbinom(n, 1, 0.5), x1 = rnorm(n), treatment = rep(c(0, 1), n / 2), strata = rep(1:(n / 2), each = 2) ) fit <- clogit(y ~ x1, data = dat, treatment = treatment, strata = strata) # Inference on treatment only (faster): fit2 <- clogit(y ~ x1, data = dat, treatment = treatment, strata = strata, do_inference_on_var = 1) n <- 200 dat <- data.frame( y = rbinom(n, 1, 0.5), x1 = rnorm(n), x2 = rnorm(n), treatment = rep(c(0, 1), n / 2), strata = rep(1:(n / 2), each = 2) ) fit <- clogit(y ~ x1 + x2, data = dat, treatment = treatment, strata = strata) summary(fit) coef(fit) vcov(fit)data("fhs") fit <- clogit(PREVHYP ~ TOTCHOL + CIGPDAY + BMI + HEARTRTE, data = fhs, treatment = PERIOD, strata = RANDID) summary(fit) n <- 200 dat <- data.frame( y = rbinom(n, 1, 0.5), x1 = rnorm(n), treatment = rep(c(0, 1), n / 2), strata = rep(1:(n / 2), each = 2) ) fit <- clogit(y ~ x1, data = dat, treatment = treatment, strata = strata) # Inference on treatment only (faster): fit2 <- clogit(y ~ x1, data = dat, treatment = treatment, strata = strata, do_inference_on_var = 1) n <- 200 dat <- data.frame( y = rbinom(n, 1, 0.5), x1 = rnorm(n), x2 = rnorm(n), treatment = rep(c(0, 1), n / 2), strata = rep(1:(n / 2), each = 2) ) fit <- clogit(y ~ x1 + x2, data = dat, treatment = treatment, strata = strata) summary(fit) coef(fit) vcov(fit)
Extract coefficients from a bclogit model
## S3 method for class 'bclogit' coef(object, ...)## S3 method for class 'bclogit' coef(object, ...)
object |
A |
... |
Additional arguments. |
Named numeric vector of posterior mean coefficients.
data("fhs") fit <- bclogit(PREVHYP ~ TOTCHOL + BMI, data = fhs, treatment = PERIOD, strata = RANDID) coef(fit)data("fhs") fit <- bclogit(PREVHYP ~ TOTCHOL + BMI, data = fhs, treatment = PERIOD, strata = RANDID) coef(fit)
Extract coefficients from a clogit_bclogit model
## S3 method for class 'clogit_bclogit' coef(object, ...)## S3 method for class 'clogit_bclogit' coef(object, ...)
object |
A |
... |
Additional arguments. |
Numeric vector of coefficients.
n <- 200 dat <- data.frame( y = rbinom(n, 1, 0.5), x1 = rnorm(n), treatment = rep(c(0, 1), n / 2), strata = rep(1:(n / 2), each = 2) ) fit <- clogit(y ~ x1, data = dat, treatment = treatment, strata = strata) coef(fit)n <- 200 dat <- data.frame( y = rbinom(n, 1, 0.5), x1 = rnorm(n), treatment = rep(c(0, 1), n / 2), strata = rep(1:(n / 2), each = 2) ) fit <- clogit(y ~ x1, data = dat, treatment = treatment, strata = strata) coef(fit)
Computes Bayesian credible intervals for the model parameters.
## S3 method for class 'bclogit' confint(object, parm, level = 0.95, type = c("HPD_one", "CR", "HPD_many"), ...)## S3 method for class 'bclogit' confint(object, parm, level = 0.95, type = c("HPD_one", "CR", "HPD_many"), ...)
object |
A |
parm |
A specification of which parameters to be given credible intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
The confidence level required (default 0.95). |
type |
Type of interval to compute: "HPD_one" (default unimodal HPD interval via coda), "CR" (equal-tailed credible region), "HPD_many" (multimodal HPD interval via ggdist). |
... |
Additional arguments. |
A matrix with columns lower and upper.
For "HPD_many", a parameter may appear on multiple rows when the interval is disjoint.
The matrix has a Probability attribute.
A subset of the Framingham Heart Study data.
fhsfhs
A data frame with 5944 rows and 39 variables:
Unique identification number for each participant. (This is the strata for the matched pairs).
Examination Cycle where 0 = baseline, 1 = endpoint (the treatment variable)
Participant sex (1 = Male, 2 = Female).
Total serum cholesterol (mg/dL).
Age at exam (years).
Systolic blood pressure (mmHg).
Diastolic blood pressure (mmHg).
Current smoking status (0 = No, 1 = Yes).
Number of cigarettes smoked per day.
Body Mass Index (kg/m^2).
Diabetes status (0 = No, 1 = Yes).
Use of Anti-hypertensive medication (0 = No, 1 = Yes).
Heart rate (beats/minute).
Fast blood glucose (mg/dL).
Education level.
Prevalence of Coronary Heart Disease.
Prevalence of Angina Pectoris.
Prevalence of Myocardial Infarction.
Prevalence of Stroke.
Prevalence of Hypertension.
Number of days since baseline exam.
High Density Lipoprotein Cholesterol.
Low Density Lipoprotein Cholesterol.
Death status.
Angina Pectoris status.
Hospitalized Myocardial Infarction.
Myocardial Infarction or Fatal Coronary Heart Disease.
Any Coronary Heart Disease event.
Stroke status.
Cardiovascular Disease status.
Hypertension status.
Time to Angina Pectoris.
Time to Myocardial Infarction.
Time to Myocardial Infarction or Fatal CHD.
Time to Any CHD.
Time to Stroke.
Time to Cardiovascular Disease.
Time to Death.
Time to Hypertension.
This dataset was constructed by running the following code:
pacman::p_load(riskCommunicator, data.table)
data("framingham")
D = data.table(framingham)
D = D[!is.na(CIGPDAY)] #we drop missing data in covariates
D = D[!is.na(BMI)]
D = D[!is.na(HEARTRTE)]
D = D[!is.na(TOTCHOL)]
Dba = D[PERIOD %in% c(1,3)] #we drop intermediate periods so we have matched pairs
Dba[PERIOD == 1, PERIOD := 0]
Dba[PERIOD == 3, PERIOD := 1]
Dba[, num_periods_per_id := .N, by = RANDID]
Dba = Dba[num_periods_per_id == 2] #we drop intermediate periods so we have matched pairs
Dba[, num_periods_per_id := NULL]
https://biolincc.nhlbi.nih.gov/teaching/
Extract model formula
## S3 method for class 'bclogit' formula(x, ...)## S3 method for class 'bclogit' formula(x, ...)
x |
A |
... |
Additional arguments. |
The formula used in the model, or NULL when the model was
fitted via the default (matrix) interface.
data("fhs") fit <- bclogit(PREVHYP ~ TOTCHOL + BMI, data = fhs, treatment = PERIOD, strata = RANDID) formula(fit)data("fhs") fit <- bclogit(PREVHYP ~ TOTCHOL + BMI, data = fhs, treatment = PERIOD, strata = RANDID) formula(fit)
Print summary of a bclogit model
## S3 method for class 'summary.bclogit' print(x, digits = max(3, getOption("digits") - 3), ...)## S3 method for class 'summary.bclogit' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
A |
digits |
Number of significant digits to print. |
... |
Additional arguments. |
Invisibly returns x.
summary.bclogit, confint.bclogit
data("fhs") fit <- bclogit(PREVHYP ~ TOTCHOL + BMI, data = fhs, treatment = PERIOD, strata = RANDID) print(summary(fit))data("fhs") fit <- bclogit(PREVHYP ~ TOTCHOL + BMI, data = fhs, treatment = PERIOD, strata = RANDID) print(summary(fit))
Print summary of a clogit_bclogit model
## S3 method for class 'summary.clogit_bclogit' print(x, digits = max(3, getOption("digits") - 3), ...)## S3 method for class 'summary.clogit_bclogit' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
A |
digits |
Number of significant digits to print. |
... |
Additional arguments. |
Invisibly returns x.
n <- 200 dat <- data.frame( y = rbinom(n, 1, 0.5), x1 = rnorm(n), treatment = rep(c(0, 1), n / 2), strata = rep(1:(n / 2), each = 2) ) fit <- clogit(y ~ x1, data = dat, treatment = treatment, strata = strata) print(summary(fit))n <- 200 dat <- data.frame( y = rbinom(n, 1, 0.5), x1 = rnorm(n), treatment = rep(c(0, 1), n / 2), strata = rep(1:(n / 2), each = 2) ) fit <- clogit(y ~ x1, data = dat, treatment = treatment, strata = strata) print(summary(fit))
Summarize a bclogit model
## S3 method for class 'bclogit' summary(object, level = 0.95, inference_method = "HPD_one", ...)## S3 method for class 'bclogit' summary(object, level = 0.95, inference_method = "HPD_one", ...)
object |
A |
level |
Confidence level for credible intervals (default 0.95). |
inference_method |
Method used for both the displayed confidence set bounds and the
p-value (computed via bisection over alpha). Options are:
|
... |
Additional arguments (not used). |
A list of class "summary.bclogit" containing:
call |
The original function call. |
coefficients |
A matrix with one row per parameter and columns for the posterior mean
estimate, posterior median estimate, standard error, lower and upper credible interval bounds,
optionally |
num_discordant |
Number of discordant pairs used for fitting. |
num_concordant |
Number of concordant pairs used for the prior. |
level |
The credible interval level used. |
inference_method |
The inference method used for interval and p-value computation. |
prior_info |
A list with elements |
treatment_name |
Name of the treatment variable. |
print.summary.bclogit, confint.bclogit,
coef.bclogit, vcov.bclogit
Summarize a clogit_bclogit model
## S3 method for class 'clogit_bclogit' summary(object, ...)## S3 method for class 'clogit_bclogit' summary(object, ...)
object |
A |
... |
Additional arguments (not used). |
A list of class "summary.clogit_bclogit" containing:
call |
The original function call. |
coefficients |
A matrix with one row per parameter and columns |
num_discordant |
Number of discordant pairs used for fitting. |
num_concordant |
Number of concordant pairs. |
n |
Total number of observations. |
treatment_name |
Name of the treatment variable. |
do_inference_on_var |
The value of the |
clogit, print.summary.clogit_bclogit,
coef.clogit_bclogit, vcov.clogit_bclogit
n <- 200 dat <- data.frame( y = rbinom(n, 1, 0.5), x1 = rnorm(n), treatment = rep(c(0, 1), n / 2), strata = rep(seq_len(n / 2), each = 2) ) fit <- clogit(y ~ x1, data = dat, treatment = treatment, strata = strata) summary(fit)n <- 200 dat <- data.frame( y = rbinom(n, 1, 0.5), x1 = rnorm(n), treatment = rep(c(0, 1), n / 2), strata = rep(seq_len(n / 2), each = 2) ) fit <- clogit(y ~ x1, data = dat, treatment = treatment, strata = strata) summary(fit)
Extract variance-covariance matrix from a bclogit model
## S3 method for class 'bclogit' vcov(object, ...)## S3 method for class 'bclogit' vcov(object, ...)
object |
A |
... |
Additional arguments. |
A square matrix of the posterior covariance of the coefficients, derived from the MCMC samples.
data("fhs") fit <- bclogit(PREVHYP ~ TOTCHOL + BMI, data = fhs, treatment = PERIOD, strata = RANDID) vcov(fit)data("fhs") fit <- bclogit(PREVHYP ~ TOTCHOL + BMI, data = fhs, treatment = PERIOD, strata = RANDID) vcov(fit)
Extract variance-covariance matrix from a clogit_bclogit model
## S3 method for class 'clogit_bclogit' vcov(object, ...)## S3 method for class 'clogit_bclogit' vcov(object, ...)
object |
A |
... |
Additional arguments. |
A diagonal matrix of the estimated coefficient variances (squared standard
errors), or NULL when do_inference_on_var is not "all".
coef.clogit_bclogit, summary.clogit_bclogit
n <- 200 dat <- data.frame( y = rbinom(n, 1, 0.5), x1 = rnorm(n), treatment = rep(c(0, 1), n / 2), strata = rep(seq_len(n / 2), each = 2) ) fit <- clogit(y ~ x1, data = dat, treatment = treatment, strata = strata) vcov(fit)n <- 200 dat <- data.frame( y = rbinom(n, 1, 0.5), x1 = rnorm(n), treatment = rep(c(0, 1), n / 2), strata = rep(seq_len(n / 2), each = 2) ) fit <- clogit(y ~ x1, data = dat, treatment = treatment, strata = strata) vcov(fit)