Package 'glmtlp' reference manual

Title:	Generalized Linear Models with Truncated Lasso Penalty
Description:	Extremely efficient procedures for fitting regularization path with l0, l1, and truncated lasso penalty for linear regression and logistic regression models. This version is a completely new version compared with our previous version, which was mainly based on R. New core algorithms are developed and are now written in C++ and highly optimized.
Authors:	Chunlin Li [aut, cph] , Yu Yang [aut, cre, cph] , Chong Wu [aut, cph] , Xiaotong Shen [ths, cph], Wei Pan [ths, cph]
Maintainer:	Yu Yang <[email protected]>
License:	GPL-3
Version:	2.0.2
Built:	2025-04-01 05:33:04 UTC
Source:	https://github.com/cran/glmtlp

A simulated binomial data set.

Description

A data set simulated for illustrating logistic regression models. Generated by gen.binomial.data(n = 200, p = 20, seed = 2021).

Usage

data(bin_data)
data(bin_data)

Format

A list with three elements: design matrix X, response y, and the true coefficient vector beta.

X: design matrix
y: response
beta: the true coefficient vector

Examples

data("bin_data")
cv.fit <- cv.glmtlp(bin_data$X, bin_data$y, family = "binomial", penalty = "l1")
plot(cv.fit)

data("bin_data")
cv.fit <- cv.glmtlp(bin_data$X, bin_data$y, family = "binomial", penalty = "l1")
plot(cv.fit)

Cross-validation for glmtlp

Description

Performs k-fold cross-validation for l0, l1, or TLP-penalized regression models over a grid of values for the regularization parameter lambda (if penalty="l0") or kappa (if penalty="l0").

Usage

cv.glmtlp(X, y, ..., seed = NULL, nfolds = 10, obs.fold = NULL, ncores = 1)
cv.glmtlp(X, y, ..., seed = NULL, nfolds = 10, obs.fold = NULL, ncores = 1)

Arguments

`X`	input matrix, of dimension `nobs` x `nvars`, as in `glmtlp`.
`y`	response, of length nobs, as in `glmtlp`.
`...`	Other arguments that can be passed to `glmtlp`.
`seed`	the seed for reproduction purposes
`nfolds`	number of folds; default is 10. The smallest value allowable is `nfolds=3`
`obs.fold`	an optional vector of values between 1 and `nfolds` identifying what fold each observation is in. If supplied, `nfolds` can be missing.
`ncores`	number of cores utilized; default is 1. If greater than 1, then `doParallel::foreach` will be used to fit each fold; if equal to 1, then for loop will be used to fit each fold. Users don't have to register parallel clusters outside.

Details

The function calls glmtlp nfolds+1 times; the first call to get the lambda or kappa sequence, and then the rest to compute the fit with each of the folds omitted. The cross-validation error is based on deviance (check here for more details). The error is accumulated over the folds, and the average error and standard deviation is computed.

When family = "binomial", the fold assignment (if not provided by the user) is generated in a stratified manner, where the ratio of 0/1 outcomes are the same for each fold.

Value

an object of class "cv.glmtlp" is returned, which is a list with the ingredients of the cross-validation fit.

`call`	the function call
`cv.mean`	The mean cross-validated error - a vector of length `length(kappa)` if `penalty = "l0"` and `length{lambda}` otherwise.
`cv.se`	estimate of standard error of `cv.mean`.
`fit`	a fitted glmtlp object for the full data.
`idx.min`	the index of the `lambda` or `kappa` sequence that corresponding to the smallest cv mean error.
`kappa`	the values of `kappa` used in the fits, available when `penalty = 'l0'`.
`kappa.min`	the value of `kappa` that gives the minimum `cv.mean`, available when `penalty = 'l0'`.
`lambda`	the values of `lambda` used in the fits.
`lambda.min`	value of `lambda` that gives minimum `cv.mean`, available when penalty is 'l1' or 'tlp'.
`null.dev`	null deviance of the model.
`obs.fold`	the fold id for each observation used in the CV.

Author(s)

Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang [email protected]

References

Shen, X., Pan, W., & Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation. Journal of the American Statistical Association, 107(497), 223-232.
Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013). On constrained and regularized high-dimensional regression. Annals of the Institute of Statistical Mathematics, 65(5), 807-832.
Li, C., Shen, X., & Pan, W. (2021). Inference for a Large Directed Graphical Model with Interventions. arXiv preprint arXiv:2110.03805.
Yang, Y., & Zou, H. (2014). A coordinate majorization descent algorithm for l1 penalized learning. Journal of Statistical Computation and Simulation, 84(1), 84-95.
Two R package Github: ncvreg and glmnet.

Examples


# Gaussian
X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "l1", seed=2021)

# Binomial
X <- matrix(rnorm(100 * 20), 100, 20)
y <- sample(c(0,1), 100, replace = TRUE)
cv.fit <- cv.glmtlp(X, y, family = "binomial", penalty = "l1", seed=2021)

# Gaussian
X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "l1", seed=2021)

# Binomial
X <- matrix(rnorm(100 * 20), 100, 20)
y <- sample(c(0,1), 100, replace = TRUE)
cv.fit <- cv.glmtlp(X, y, family = "binomial", penalty = "l1", seed=2021)

A simulated gaussian data set.

Description

A data set simulated for illustrating linear regression models. Generated by gen.gaussian.data(n = 200, p = 20, seed = 2021).

Usage

data(gau_data)
data(gau_data)

Format

A list with five elements: design matrix X, response y, correlation structure of the covariates Sigma, true beta beta, and the noise level sigma.

X: design matrix
y: response
beta: true beta values
sigma: the noise level

Examples

data("gau_data")
cv.fit <- cv.glmtlp(gau_data$X, gau_data$y, family = "gaussian", penalty = "tlp")
plot(cv.fit)

data("gau_data")
cv.fit <- cv.glmtlp(gau_data$X, gau_data$y, family = "gaussian", penalty = "tlp")
plot(cv.fit)

Simulate a binomial data set

Description

Simulate a data set with binary response following the logistic regression model.

Usage

gen.binomial.data(n, p, rho = 0, kappa = 5, beta.type = 1, seed = 2021)
gen.binomial.data(n, p, rho = 0, kappa = 5, beta.type = 1, seed = 2021)

Arguments

`n`	Sample size.
`p`	Number of covariates.
`rho`	The parameter defining the AR(1) correlation matrix.
`kappa`	The number of nonzero coefficients.
`beta.type`	Numeric indicator for choosing the beta type. For `beta.type = 1`, the true coefficient vector has `kappa` components being 1, roughly equally distributed between 1 to `p`. For `beta.type = 2`, the first `kappa` values are 1, and the rest are 0. For `beta.type = 3`, the first `kappa` values are equally-spaced values from 10 to 0.5, and the rest are 0. For `beta.type = 4`, the first `kappa` values are the first `kappa` values in c(-10, -6, -2, 2, 6, 10), and the rest are 0. For `beta.type = 5`, the first `kappa` values are 1, and the rest decay exponentially to 0 with base 0.5.
`seed`	The seed for reproducibility. Default is 2021.

Value

A list containing the simulated data.

`X`	the covariate matrix, of dimension `n` x `p`.
`y`	the response, of length `n`.
`beta`	the true coefficients, of length `p`.

Examples

bin_data <- gen.binomial.data(n = 200, p = 20, seed = 2021)
head(bin_data$X)
head(bin_data$y)
head(bin_data$beta)

bin_data <- gen.binomial.data(n = 200, p = 20, seed = 2021)
head(bin_data$X)
head(bin_data$y)
head(bin_data$beta)

Simulate a gaussian data set

Description

Simulate a data set with gaussian response following the linear regression model.

Usage

gen.gaussian.data(
  n,
  p,
  rho = 0,
  kappa = 5,
  beta.type = 1,
  snr = 1,
  seed = 2021
)
gen.gaussian.data(
  n,
  p,
  rho = 0,
  kappa = 5,
  beta.type = 1,
  snr = 1,
  seed = 2021
)

Arguments

`n`	Sample size.
`p`	Number of covariates.
`rho`	The parameter defining the AR(1) correlation matrix.
`kappa`	The number of nonzero coefficients.
`beta.type`	Numeric indicator for choosing the beta type. For `beta.type = 1`, the true coefficient vector has `kappa` components being 1, roughly equally distributed between 1 to `p`. For `beta.type = 2`, the first `kappa` values are 1, and the rest are 0. For `beta.type = 3`, the first `kappa` values are equally-spaced values from 10 to 0.5, and the rest are 0. For `beta.type = 4`, the first `kappa` values are the first `kappa` values in c(-10, -6, -2, 2, 6, 10), and the rest are 0. For `beta.type = 5`, the first `kappa` values are 1, and the rest decay exponentially to 0 with base 0.5.
`snr`	Signal-to-noise ratio. Default is 1.
`seed`	The seed for reproducibility. Default is 2021.

Value

A list containing the simulated data.

`X`	the covariate matrix, of dimension `n` x `p`.
`y`	the response, of length `n`.
`beta`	the true coefficients, of length `p`.
`sigma`	the standard error of the noise.

Examples

gau_data <- gen.gaussian.data(n = 200, p = 20, seed = 2021)
head(gau_data$X)
head(gau_data$y)
head(gau_data$beta)
gau_data$sigma

gau_data <- gen.gaussian.data(n = 200, p = 20, seed = 2021)
head(gau_data$X)
head(gau_data$y)
head(gau_data$beta)
gau_data$sigma

Plot Method for a "cv.glmtlp" Object

Description

Plots the cross-validation curve, and the upper and lower standard deviation curves, as a function of the lambda or kappa values.

Usage

## S3 method for class 'cv.glmtlp'
plot(x, vertical.line = TRUE, ...)
## S3 method for class 'cv.glmtlp'
plot(x, vertical.line = TRUE, ...)

Arguments

`x`	Fitted `cv.glmtlp` object
`vertical.line`	Logical. Whether or not include a vertical line indicating the position of the index which gives the smallest CV error.
`...`	Additional arguments.

Details

The generated plot is a ggplot object, and therefore, the users are able to customize the plots following the ggplot2 syntax.

Author(s)

Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang [email protected]

References

Examples

X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "tlp")
plot(cv.fit)
plot(cv.fit, vertical.line = FALSE)
cv.fit2 <- cv.glmtlp(X, y, family = "gaussian", penalty = "l0")
plot(cv.fit2)
plot(cv.fit2, vertical.line = FALSE)

data("gau_data")
cv.fit <- cv.glmtlp(gau_data$X, gau_data$y, family = "gaussian", penalty = "tlp")
plot(cv.fit)

data("bin_data")
cv.fit <- cv.glmtlp(bin_data$X, bin_data$y, family = "binomial", penalty = "l1")
plot(cv.fit)

X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "tlp")
plot(cv.fit)
plot(cv.fit, vertical.line = FALSE)
cv.fit2 <- cv.glmtlp(X, y, family = "gaussian", penalty = "l0")
plot(cv.fit2)
plot(cv.fit2, vertical.line = FALSE)

data("gau_data")
cv.fit <- cv.glmtlp(gau_data$X, gau_data$y, family = "gaussian", penalty = "tlp")
plot(cv.fit)

data("bin_data")
cv.fit <- cv.glmtlp(bin_data$X, bin_data$y, family = "binomial", penalty = "l1")
plot(cv.fit)

Plot Method for a "glmtlp" Object

Description

Generates a solution path plot for a fitted "glmtlp" object.

Usage

## S3 method for class 'glmtlp'
plot(
  x,
  xvar = c("lambda", "kappa", "deviance", "l1_norm", "log_lambda"),
  xlab = iname,
  ylab = "Coefficients",
  title = "Solution Path",
  label = FALSE,
  label.size = 3,
  ...
)
## S3 method for class 'glmtlp'
plot(
  x,
  xvar = c("lambda", "kappa", "deviance", "l1_norm", "log_lambda"),
  xlab = iname,
  ylab = "Coefficients",
  title = "Solution Path",
  label = FALSE,
  label.size = 3,
  ...
)

Arguments

`x`	Fitted `glmtlp` object.
`xvar`	The x-axis variable to plot against, including `"lambda"`, `"kappa"`, `"deviance"`, `"l1_norm"`, and `"log_lambda"`.
`xlab`	The x-axis label of the plot, default is `"Lambda"`, `"Kappa"`, `"Fraction of Explained Deviance"`, `"L1 Norm"`, and `"Log Lambda"`.
`ylab`	The y-axis label of the plot, default is "Coefficients".
`title`	The main title of the plot, default is "Solution Path".
`label`	Logical, whether or not attach the labels for the non-zero coefficients, default is `FALSE`.
`label.size`	The text size of the labels, default is 3.
`...`	Additional arguments.

Details

The generated plot is a ggplot object, and therefore, the users are able to customize the plots following the ggplot2 syntax.

Value

A ggplot object.

Author(s)

Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang [email protected]

References

Examples

X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
fit <- glmtlp(X, y, family = "gaussian", penalty = "l1")
plot(fit, xvar = "lambda")
plot(fit, xvar = "log_lambda")
plot(fit, xvar = "l1_norm")
plot(fit, xvar = "log_lambda", label = TRUE)
fit2 <- glmtlp(X, y, family = "gaussian", penalty = "l0")
plot(fit2, xvar = "kappa", label = TRUE)

X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
fit <- glmtlp(X, y, family = "gaussian", penalty = "l1")
plot(fit, xvar = "lambda")
plot(fit, xvar = "log_lambda")
plot(fit, xvar = "l1_norm")
plot(fit, xvar = "log_lambda", label = TRUE)
fit2 <- glmtlp(X, y, family = "gaussian", penalty = "l0")
plot(fit2, xvar = "kappa", label = TRUE)

Predict Method for a "cv.glmtlp" Object.

Description

Makes predictions for a cross-validated glmtlp model, using the stored "glmtlp" object, and the optimal value chosen for lambda.

Usage

## S3 method for class 'cv.glmtlp'
predict(
  object,
  X,
  type = c("link", "response", "class", "coefficients", "numnzs", "varnzs"),
  lambda = NULL,
  kappa = NULL,
  which = object$idx.min,
  ...
)

## S3 method for class 'cv.glmtlp'
coef(object, lambda = NULL, kappa = NULL, which = object$idx.min, ...)
## S3 method for class 'cv.glmtlp'
predict(
  object,
  X,
  type = c("link", "response", "class", "coefficients", "numnzs", "varnzs"),
  lambda = NULL,
  kappa = NULL,
  which = object$idx.min,
  ...
)

## S3 method for class 'cv.glmtlp'
coef(object, lambda = NULL, kappa = NULL, which = object$idx.min, ...)

Arguments

`object`	Fitted `"cv.glmtlp"` object.
`X`	X Matrix of new values for `X` at which predictions are to be made. Must be a matrix.
`type`	Type of prediction to be made. For `"gaussian"` models, type `"link"` and `"response"` are equivalent and both give the fitted values. For `"binomial"` models, type `"link"` gives the linear predictors and type `"response"` gives the fitted probabilities. Type `"coefficients"` computes the coefficients at the provided values of `lambda` or `kappa`. Note that for `"binomial"` models, results are returned only for the class corresponding to the second level of the factor response. Type `"class"` applies only to `"binomial"` models, and gives the class label corresponding to the maximum probability. Type `"numnz"` gives the total number of non-zero coefficients for each value of `lambda` or `kappa`. Type `"varnz"` gives a list of indices of the nonzero coefficients for each value of `lambda` or `kappa`.
`lambda`	Value of the penalty parameter `lambda` at which predictions are to be made Default is NULL.
`kappa`	Value of the penalty parameter `kappa` at which predictions are to be made. Default is NULL.
`which`	Index of the penalty parameter `lambda` or `kappa` sequence at which predictions are to be made. Default is the `idx.min` stored in the `cv.glmtp` object.
`...`	Additional arguments.

Value

The object returned depends on type.

Author(s)

Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang [email protected]

References

Examples

X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "l1")
predict(cv.fit, X = X[1:5, ])
coef(cv.fit)
predict(cv.fit, X = X[1:5, ], lambda = 0.1)

X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "l1")
predict(cv.fit, X = X[1:5, ])
coef(cv.fit)
predict(cv.fit, X = X[1:5, ], lambda = 0.1)

Predict Method for a "glmtlp" Object

Description

Predicts fitted values, logits, coefficients and more from a fitted glmtlp object.

Usage

## S3 method for class 'glmtlp'
predict(
  object,
  X,
  type = c("link", "response", "class", "coefficients", "numnz", "varnz"),
  lambda = NULL,
  kappa = NULL,
  which = 1:(ifelse(object$penalty == "l0", length(object$kappa), length(object$lambda))),
  ...
)

## S3 method for class 'glmtlp'
coef(
  object,
  lambda = NULL,
  kappa = NULL,
  which = 1:(ifelse(object$penalty == "l0", length(object$kappa), length(object$lambda))),
  drop = TRUE,
  ...
)
## S3 method for class 'glmtlp'
predict(
  object,
  X,
  type = c("link", "response", "class", "coefficients", "numnz", "varnz"),
  lambda = NULL,
  kappa = NULL,
  which = 1:(ifelse(object$penalty == "l0", length(object$kappa), length(object$lambda))),
  ...
)

## S3 method for class 'glmtlp'
coef(
  object,
  lambda = NULL,
  kappa = NULL,
  which = 1:(ifelse(object$penalty == "l0", length(object$kappa), length(object$lambda))),
  drop = TRUE,
  ...
)

Arguments

`object`	Fitted `glmtlp` model object.
`X`	Matrix of new values for `X` at which predictions are to be made. Must be a matrix. This argument will not used for `type=c("coefficients","numnz", "varnz")`.
`type`	Type of prediction to be made. For `"gaussian"` models, type `"link"` and `"response"` are equivalent and both give the fitted values. For `"binomial"` models, type `"link"` gives the linear predictors and type `"response"` gives the fitted probabilities. Type `"coefficients"` computes the coefficients at the provided values of `lambda` or `kappa`. Note that for `"binomial"` models, results are returned only for the class corresponding to the second level of the factor response. Type `"class"` applies only to `"binomial"` models, and gives the class label corresponding to the maximum probability. Type `"numnz"` gives the total number of non-zero coefficients for each value of `lambda` or `kappa`. Type `"varnz"` gives a list of indices of the nonzero coefficients for each value of `lambda` or `kappa`.
`lambda`	Value of the penalty parameter `lambda` at which predictions are to be made Default is NULL.
`kappa`	Value of the penalty parameter `kappa` at which predictions are to be made. Default is NULL.
`which`	Index of the penalty parameter `lambda` or `kappa` sequence at which predictions are to be made. Default are the indices for the entire penalty parameter sequence.
`...`	Additional arguments.
`drop`	Whether or not keep the dimension that is of length 1.

Details

coef(...) is equivalent to predict(type="coefficients",...)

Value

The object returned depends on type.

Author(s)

Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang [email protected]

References

Examples


# Gaussian
X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
fit <- glmtlp(X, y, family = "gaussian", penalty = "l1")
predict(fit, X = X[1:5, ])
coef(fit)
predict(fit, X = X[1:5, ], lambda = 0.1)

# Binomial
X <- matrix(rnorm(100 * 20), 100, 20)
y <- sample(c(0,1), 100, replace = TRUE)
fit <- glmtlp(X, y, family = "binomial", penalty = "l1")
coef(fit)
predict(fit, X = X[1:5, ], type = "response")
predict(fit, X = X[1:5, ], type = "response", lambda = 0.01)
predict(fit, X = X[1:5, ], type = "class", lambda = 0.01)
predict(fit, X = X[1:5, ], type = "numnz", lambda = 0.01)

# Gaussian
X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
fit <- glmtlp(X, y, family = "gaussian", penalty = "l1")
predict(fit, X = X[1:5, ])
coef(fit)
predict(fit, X = X[1:5, ], lambda = 0.1)

# Binomial
X <- matrix(rnorm(100 * 20), 100, 20)
y <- sample(c(0,1), 100, replace = TRUE)
fit <- glmtlp(X, y, family = "binomial", penalty = "l1")
coef(fit)
predict(fit, X = X[1:5, ], type = "response")
predict(fit, X = X[1:5, ], type = "response", lambda = 0.01)
predict(fit, X = X[1:5, ], type = "class", lambda = 0.01)
predict(fit, X = X[1:5, ], type = "numnz", lambda = 0.01)

Generate lambda sequence.

Description

Generate lambda sequence.

Usage

setup_lambda(X, y, weights, lambda.min.ratio, nlambda)
setup_lambda(X, y, weights, lambda.min.ratio, nlambda)

Arguments

`X`	Input matrix, of dimension `nobs` x `nvars`; each row is an observation vector.
`y`	Response variable, of length `nobs`. For `family="gaussian"`, it should be quantitative; for `family="binomial"`, it should be either a factor with two levels or a binary vector.
`weights`	Observation weights.
`lambda.min.ratio`	The smallest value for `lambda`, as a fraction of `lambda.max`, the smallest value for which all coefficients are zero. The default depends on the sample size `nobs` relative to the number of variables `nvars`.
`nlambda`	The number of `lambda` values.

Package 'glmtlp'

Help Index

A simulated binomial data set.

Description

Usage

Format

Examples

Cross-validation for glmtlp

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

A simulated gaussian data set.

Description

Usage

Format

Examples

Simulate a binomial data set

Description

Usage

Arguments

Value

Examples

Simulate a gaussian data set

Description

Usage

Arguments

Value

Examples

Plot Method for a "cv.glmtlp" Object

Description

Usage

Arguments

Details

Author(s)

References

Examples

Plot Method for a "glmtlp" Object

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Predict Method for a "cv.glmtlp" Object.

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Predict Method for a "glmtlp" Object

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Generate lambda sequence.

Description

Usage

Arguments