Title: | Generalized Linear Models with Truncated Lasso Penalty |
---|---|
Description: | Extremely efficient procedures for fitting regularization path with l0, l1, and truncated lasso penalty for linear regression and logistic regression models. This version is a completely new version compared with our previous version, which was mainly based on R. New core algorithms are developed and are now written in C++ and highly optimized. |
Authors: | Chunlin Li [aut, cph] , Yu Yang [aut, cre, cph] , Chong Wu [aut, cph] , Xiaotong Shen [ths, cph], Wei Pan [ths, cph] |
Maintainer: | Yu Yang <[email protected]> |
License: | GPL-3 |
Version: | 2.0.2 |
Built: | 2024-11-02 05:28:48 UTC |
Source: | https://github.com/cran/glmtlp |
A data set simulated for illustrating logistic regression models. Generated by
gen.binomial.data(n = 200, p = 20, seed = 2021)
.
data(bin_data)
data(bin_data)
A list with three elements: design matrix X
, response y
,
and the true coefficient vector beta
.
design matrix
response
the true coefficient vector
data("bin_data") cv.fit <- cv.glmtlp(bin_data$X, bin_data$y, family = "binomial", penalty = "l1") plot(cv.fit)
data("bin_data") cv.fit <- cv.glmtlp(bin_data$X, bin_data$y, family = "binomial", penalty = "l1") plot(cv.fit)
Performs k-fold cross-validation for l0, l1, or TLP-penalized regression models
over a grid of values for the regularization parameter lambda
(if penalty="l0"
) or kappa
(if penalty="l0"
).
cv.glmtlp(X, y, ..., seed = NULL, nfolds = 10, obs.fold = NULL, ncores = 1)
cv.glmtlp(X, y, ..., seed = NULL, nfolds = 10, obs.fold = NULL, ncores = 1)
X |
input matrix, of dimension |
y |
response, of length nobs, as in |
... |
Other arguments that can be passed to |
seed |
the seed for reproduction purposes |
nfolds |
number of folds; default is 10. The smallest value allowable
is |
obs.fold |
an optional vector of values between 1 and |
ncores |
number of cores utilized; default is 1. If greater than 1,
then |
The function calls glmtlp
nfolds
+1 times; the first call to get the
lambda
or kappa
sequence, and then the rest to compute
the fit with each of the folds omitted. The cross-validation error is based
on deviance (check here for more details). The error is accumulated over the
folds, and the average error and standard deviation is computed.
When family = "binomial"
, the fold assignment (if not provided by
the user) is generated in a stratified manner, where the ratio of 0/1 outcomes
are the same for each fold.
an object of class "cv.glmtlp"
is returned, which is a list
with the ingredients of the cross-validation fit.
call |
the function call |
cv.mean |
The mean cross-validated error - a vector of length
|
cv.se |
estimate of standard error of |
fit |
a fitted glmtlp object for the full data. |
idx.min |
the index of the |
kappa |
the values of |
kappa.min |
the value of |
lambda |
the values of |
lambda.min |
value of |
null.dev |
null deviance of the model. |
obs.fold |
the fold id for each observation used in the CV. |
Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang [email protected]
Shen, X., Pan, W., & Zhu, Y. (2012).
Likelihood-based selection and sharp parameter estimation.
Journal of the American Statistical Association, 107(497), 223-232.
Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013).
On constrained and regularized high-dimensional regression.
Annals of the Institute of Statistical Mathematics, 65(5), 807-832.
Li, C., Shen, X., & Pan, W. (2021).
Inference for a Large Directed Graphical Model with Interventions.
arXiv preprint arXiv:2110.03805.
Yang, Y., & Zou, H. (2014).
A coordinate majorization descent algorithm for l1 penalized learning.
Journal of Statistical Computation and Simulation, 84(1), 84-95.
Two R package Github: ncvreg and glmnet.
glmtlp
and plot
, predict
, and coef
methods for "cv.glmtlp"
objects.
# Gaussian X <- matrix(rnorm(100 * 20), 100, 20) y <- rnorm(100) cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "l1", seed=2021) # Binomial X <- matrix(rnorm(100 * 20), 100, 20) y <- sample(c(0,1), 100, replace = TRUE) cv.fit <- cv.glmtlp(X, y, family = "binomial", penalty = "l1", seed=2021)
# Gaussian X <- matrix(rnorm(100 * 20), 100, 20) y <- rnorm(100) cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "l1", seed=2021) # Binomial X <- matrix(rnorm(100 * 20), 100, 20) y <- sample(c(0,1), 100, replace = TRUE) cv.fit <- cv.glmtlp(X, y, family = "binomial", penalty = "l1", seed=2021)
A data set simulated for illustrating linear regression models. Generated by
gen.gaussian.data(n = 200, p = 20, seed = 2021)
.
data(gau_data)
data(gau_data)
A list with five elements: design matrix X
, response y
,
correlation structure of the covariates Sigma
, true beta beta
,
and the noise level sigma
.
design matrix
response
true beta values
the noise level
data("gau_data") cv.fit <- cv.glmtlp(gau_data$X, gau_data$y, family = "gaussian", penalty = "tlp") plot(cv.fit)
data("gau_data") cv.fit <- cv.glmtlp(gau_data$X, gau_data$y, family = "gaussian", penalty = "tlp") plot(cv.fit)
Simulate a data set with binary response following the logistic regression model.
gen.binomial.data(n, p, rho = 0, kappa = 5, beta.type = 1, seed = 2021)
gen.binomial.data(n, p, rho = 0, kappa = 5, beta.type = 1, seed = 2021)
n |
Sample size. |
p |
Number of covariates. |
rho |
The parameter defining the AR(1) correlation matrix. |
kappa |
The number of nonzero coefficients. |
beta.type |
Numeric indicator for choosing the beta type. For
|
seed |
The seed for reproducibility. Default is 2021. |
A list containing the simulated data.
X |
the covariate matrix, of dimension |
y |
the response, of length |
beta |
the true coefficients, of length |
bin_data <- gen.binomial.data(n = 200, p = 20, seed = 2021) head(bin_data$X) head(bin_data$y) head(bin_data$beta)
bin_data <- gen.binomial.data(n = 200, p = 20, seed = 2021) head(bin_data$X) head(bin_data$y) head(bin_data$beta)
Simulate a data set with gaussian response following the linear regression model.
gen.gaussian.data( n, p, rho = 0, kappa = 5, beta.type = 1, snr = 1, seed = 2021 )
gen.gaussian.data( n, p, rho = 0, kappa = 5, beta.type = 1, snr = 1, seed = 2021 )
n |
Sample size. |
p |
Number of covariates. |
rho |
The parameter defining the AR(1) correlation matrix. |
kappa |
The number of nonzero coefficients. |
beta.type |
Numeric indicator for choosing the beta type. For
|
snr |
Signal-to-noise ratio. Default is 1. |
seed |
The seed for reproducibility. Default is 2021. |
A list containing the simulated data.
X |
the covariate matrix, of dimension |
y |
the response, of length |
beta |
the true coefficients, of length |
sigma |
the standard error of the noise. |
gau_data <- gen.gaussian.data(n = 200, p = 20, seed = 2021) head(gau_data$X) head(gau_data$y) head(gau_data$beta) gau_data$sigma
gau_data <- gen.gaussian.data(n = 200, p = 20, seed = 2021) head(gau_data$X) head(gau_data$y) head(gau_data$beta) gau_data$sigma
Plots the cross-validation curve, and the upper and lower standard deviation
curves, as a function of the lambda
or kappa
values.
## S3 method for class 'cv.glmtlp' plot(x, vertical.line = TRUE, ...)
## S3 method for class 'cv.glmtlp' plot(x, vertical.line = TRUE, ...)
x |
Fitted |
vertical.line |
Logical. Whether or not include a vertical line indicating the position of the index which gives the smallest CV error. |
... |
Additional arguments. |
The generated plot is a ggplot
object, and therefore, the users are able
to customize the plots following the ggplot2
syntax.
Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang [email protected]
Shen, X., Pan, W., & Zhu, Y. (2012).
Likelihood-based selection and sharp parameter estimation.
Journal of the American Statistical Association, 107(497), 223-232.
Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013).
On constrained and regularized high-dimensional regression.
Annals of the Institute of Statistical Mathematics, 65(5), 807-832.
Li, C., Shen, X., & Pan, W. (2021).
Inference for a Large Directed Graphical Model with Interventions.
arXiv preprint arXiv:2110.03805.
Yang, Y., & Zou, H. (2014).
A coordinate majorization descent algorithm for l1 penalized learning.
Journal of Statistical Computation and Simulation, 84(1), 84-95.
Two R package Github: ncvreg and glmnet.
X <- matrix(rnorm(100 * 20), 100, 20) y <- rnorm(100) cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "tlp") plot(cv.fit) plot(cv.fit, vertical.line = FALSE) cv.fit2 <- cv.glmtlp(X, y, family = "gaussian", penalty = "l0") plot(cv.fit2) plot(cv.fit2, vertical.line = FALSE) data("gau_data") cv.fit <- cv.glmtlp(gau_data$X, gau_data$y, family = "gaussian", penalty = "tlp") plot(cv.fit) data("bin_data") cv.fit <- cv.glmtlp(bin_data$X, bin_data$y, family = "binomial", penalty = "l1") plot(cv.fit)
X <- matrix(rnorm(100 * 20), 100, 20) y <- rnorm(100) cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "tlp") plot(cv.fit) plot(cv.fit, vertical.line = FALSE) cv.fit2 <- cv.glmtlp(X, y, family = "gaussian", penalty = "l0") plot(cv.fit2) plot(cv.fit2, vertical.line = FALSE) data("gau_data") cv.fit <- cv.glmtlp(gau_data$X, gau_data$y, family = "gaussian", penalty = "tlp") plot(cv.fit) data("bin_data") cv.fit <- cv.glmtlp(bin_data$X, bin_data$y, family = "binomial", penalty = "l1") plot(cv.fit)
Generates a solution path plot for a fitted "glmtlp"
object.
## S3 method for class 'glmtlp' plot( x, xvar = c("lambda", "kappa", "deviance", "l1_norm", "log_lambda"), xlab = iname, ylab = "Coefficients", title = "Solution Path", label = FALSE, label.size = 3, ... )
## S3 method for class 'glmtlp' plot( x, xvar = c("lambda", "kappa", "deviance", "l1_norm", "log_lambda"), xlab = iname, ylab = "Coefficients", title = "Solution Path", label = FALSE, label.size = 3, ... )
x |
Fitted |
xvar |
The x-axis variable to plot against, including |
xlab |
The x-axis label of the plot, default is |
ylab |
The y-axis label of the plot, default is "Coefficients". |
title |
The main title of the plot, default is "Solution Path". |
label |
Logical, whether or not attach the labels for the non-zero
coefficients, default is |
label.size |
The text size of the labels, default is 3. |
... |
Additional arguments. |
The generated plot is a ggplot
object, and therefore, the users are able
to customize the plots following the ggplot2
syntax.
A ggplot
object.
Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang [email protected]
Shen, X., Pan, W., & Zhu, Y. (2012).
Likelihood-based selection and sharp parameter estimation.
Journal of the American Statistical Association, 107(497), 223-232.
Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013).
On constrained and regularized high-dimensional regression.
Annals of the Institute of Statistical Mathematics, 65(5), 807-832.
Li, C., Shen, X., & Pan, W. (2021).
Inference for a Large Directed Graphical Model with Interventions.
arXiv preprint arXiv:2110.03805.
Yang, Y., & Zou, H. (2014).
A coordinate majorization descent algorithm for l1 penalized learning.
Journal of Statistical Computation and Simulation, 84(1), 84-95.
Two R package Github: ncvreg and glmnet.
print
, predict
, coef
and plot
methods,
and the cv.glmtlp
function.
X <- matrix(rnorm(100 * 20), 100, 20) y <- rnorm(100) fit <- glmtlp(X, y, family = "gaussian", penalty = "l1") plot(fit, xvar = "lambda") plot(fit, xvar = "log_lambda") plot(fit, xvar = "l1_norm") plot(fit, xvar = "log_lambda", label = TRUE) fit2 <- glmtlp(X, y, family = "gaussian", penalty = "l0") plot(fit2, xvar = "kappa", label = TRUE)
X <- matrix(rnorm(100 * 20), 100, 20) y <- rnorm(100) fit <- glmtlp(X, y, family = "gaussian", penalty = "l1") plot(fit, xvar = "lambda") plot(fit, xvar = "log_lambda") plot(fit, xvar = "l1_norm") plot(fit, xvar = "log_lambda", label = TRUE) fit2 <- glmtlp(X, y, family = "gaussian", penalty = "l0") plot(fit2, xvar = "kappa", label = TRUE)
Makes predictions for a cross-validated glmtlp model, using
the stored "glmtlp"
object, and the optimal value chosen for
lambda
.
## S3 method for class 'cv.glmtlp' predict( object, X, type = c("link", "response", "class", "coefficients", "numnzs", "varnzs"), lambda = NULL, kappa = NULL, which = object$idx.min, ... ) ## S3 method for class 'cv.glmtlp' coef(object, lambda = NULL, kappa = NULL, which = object$idx.min, ...)
## S3 method for class 'cv.glmtlp' predict( object, X, type = c("link", "response", "class", "coefficients", "numnzs", "varnzs"), lambda = NULL, kappa = NULL, which = object$idx.min, ... ) ## S3 method for class 'cv.glmtlp' coef(object, lambda = NULL, kappa = NULL, which = object$idx.min, ...)
object |
Fitted |
X |
X Matrix of new values for |
type |
Type of prediction to be made. For |
lambda |
Value of the penalty parameter |
kappa |
Value of the penalty parameter |
which |
Index of the penalty parameter |
... |
Additional arguments. |
The object returned depends on type
.
Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang [email protected]
Shen, X., Pan, W., & Zhu, Y. (2012).
Likelihood-based selection and sharp parameter estimation.
Journal of the American Statistical Association, 107(497), 223-232.
Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013).
On constrained and regularized high-dimensional regression.
Annals of the Institute of Statistical Mathematics, 65(5), 807-832.
Li, C., Shen, X., & Pan, W. (2021).
Inference for a Large Directed Graphical Model with Interventions.
arXiv preprint arXiv:2110.03805.
Yang, Y., & Zou, H. (2014).
A coordinate majorization descent algorithm for l1 penalized learning.
Journal of Statistical Computation and Simulation, 84(1), 84-95.
Two R package Github: ncvreg and glmnet.
print
, predict
, coef
and plot
methods,
and the cv.glmtlp
function.
X <- matrix(rnorm(100 * 20), 100, 20) y <- rnorm(100) cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "l1") predict(cv.fit, X = X[1:5, ]) coef(cv.fit) predict(cv.fit, X = X[1:5, ], lambda = 0.1)
X <- matrix(rnorm(100 * 20), 100, 20) y <- rnorm(100) cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "l1") predict(cv.fit, X = X[1:5, ]) coef(cv.fit) predict(cv.fit, X = X[1:5, ], lambda = 0.1)
Predicts fitted values, logits, coefficients and more from a fitted
glmtlp
object.
## S3 method for class 'glmtlp' predict( object, X, type = c("link", "response", "class", "coefficients", "numnz", "varnz"), lambda = NULL, kappa = NULL, which = 1:(ifelse(object$penalty == "l0", length(object$kappa), length(object$lambda))), ... ) ## S3 method for class 'glmtlp' coef( object, lambda = NULL, kappa = NULL, which = 1:(ifelse(object$penalty == "l0", length(object$kappa), length(object$lambda))), drop = TRUE, ... )
## S3 method for class 'glmtlp' predict( object, X, type = c("link", "response", "class", "coefficients", "numnz", "varnz"), lambda = NULL, kappa = NULL, which = 1:(ifelse(object$penalty == "l0", length(object$kappa), length(object$lambda))), ... ) ## S3 method for class 'glmtlp' coef( object, lambda = NULL, kappa = NULL, which = 1:(ifelse(object$penalty == "l0", length(object$kappa), length(object$lambda))), drop = TRUE, ... )
object |
Fitted |
X |
Matrix of new values for |
type |
Type of prediction to be made. For |
lambda |
Value of the penalty parameter |
kappa |
Value of the penalty parameter |
which |
Index of the penalty parameter |
... |
Additional arguments. |
drop |
Whether or not keep the dimension that is of length 1. |
coef(...)
is equivalent to predict(type="coefficients",...)
The object returned depends on type
.
Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang [email protected]
Shen, X., Pan, W., & Zhu, Y. (2012).
Likelihood-based selection and sharp parameter estimation.
Journal of the American Statistical Association, 107(497), 223-232.
Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013).
On constrained and regularized high-dimensional regression.
Annals of the Institute of Statistical Mathematics, 65(5), 807-832.
Li, C., Shen, X., & Pan, W. (2021).
Inference for a Large Directed Graphical Model with Interventions.
arXiv preprint arXiv:2110.03805.
Yang, Y., & Zou, H. (2014).
A coordinate majorization descent algorithm for l1 penalized learning.
Journal of Statistical Computation and Simulation, 84(1), 84-95.
Two R package Github: ncvreg and glmnet.
print
, predict
, coef
and plot
methods,
and the cv.glmtlp
function.
# Gaussian X <- matrix(rnorm(100 * 20), 100, 20) y <- rnorm(100) fit <- glmtlp(X, y, family = "gaussian", penalty = "l1") predict(fit, X = X[1:5, ]) coef(fit) predict(fit, X = X[1:5, ], lambda = 0.1) # Binomial X <- matrix(rnorm(100 * 20), 100, 20) y <- sample(c(0,1), 100, replace = TRUE) fit <- glmtlp(X, y, family = "binomial", penalty = "l1") coef(fit) predict(fit, X = X[1:5, ], type = "response") predict(fit, X = X[1:5, ], type = "response", lambda = 0.01) predict(fit, X = X[1:5, ], type = "class", lambda = 0.01) predict(fit, X = X[1:5, ], type = "numnz", lambda = 0.01)
# Gaussian X <- matrix(rnorm(100 * 20), 100, 20) y <- rnorm(100) fit <- glmtlp(X, y, family = "gaussian", penalty = "l1") predict(fit, X = X[1:5, ]) coef(fit) predict(fit, X = X[1:5, ], lambda = 0.1) # Binomial X <- matrix(rnorm(100 * 20), 100, 20) y <- sample(c(0,1), 100, replace = TRUE) fit <- glmtlp(X, y, family = "binomial", penalty = "l1") coef(fit) predict(fit, X = X[1:5, ], type = "response") predict(fit, X = X[1:5, ], type = "response", lambda = 0.01) predict(fit, X = X[1:5, ], type = "class", lambda = 0.01) predict(fit, X = X[1:5, ], type = "numnz", lambda = 0.01)
Generate lambda sequence.
setup_lambda(X, y, weights, lambda.min.ratio, nlambda)
setup_lambda(X, y, weights, lambda.min.ratio, nlambda)
X |
Input matrix, of dimension |
y |
Response variable, of length |
weights |
Observation weights. |
lambda.min.ratio |
The smallest value for |
nlambda |
The number of |