library(lessR)
#>
#> lessR 4.3.9 feedback: gerbing@pdx.edu
#> --------------------------------------------------------------
#> > d <- Read("") Read text, Excel, SPSS, SAS, or R data file
#> d is default data frame, data= in analysis routines optional
#>
#> Many examples of reading, writing, and manipulating data,
#> graphics, testing means and proportions, regression, factor analysis,
#> customization, and descriptive statistics from pivot tables
#> Enter: browseVignettes("lessR")
#>
#> View lessR updates, now including time series forecasting
#> Enter: news(package="lessR")
#>
#> Interactive data analysis
#> Enter: interact()
#>
#> Attaching package: 'lessR'
#> The following object is masked from 'package:base':
#>
#> sort_by
The vignette examples of using lessR became so extensive that the maximum R package installation size was exceeded. Find a limited number of examples below. Find many more vignette examples at:
Many of the following examples analyze data in the Employee
data set, included with lessR. To read an internal
lessR data set, just pass the name of the data set to
the lessR function Read()
. Read the
Employee data into the data frame d. For data sets
other than those provided by lessR, enter the path name
or URL between the quotes, or leave the quotes empty to browse for the
data file on your computer system. See the Read and Write
vignette for more details.
d <- Read("Employee")
#>
#> >>> Suggestions
#> Recommended binary format for data files: feather
#> Create with Write(d, "your_file", format="feather")
#> More details about your data, Enter: details() for d, or details(name)
#>
#> Data Types
#> ------------------------------------------------------------
#> character: Non-numeric data values
#> integer: Numeric data values, integers only
#> double: Numeric data values with decimal digits
#> ------------------------------------------------------------
#>
#> Variable Missing Unique
#> Name Type Values Values Values First and last values
#> ------------------------------------------------------------------------------------------
#> 1 Years integer 36 1 16 7 NA 7 ... 1 2 10
#> 2 Gender character 37 0 2 M M W ... W W M
#> 3 Dept character 36 1 5 ADMN SALE FINC ... MKTG SALE FINC
#> 4 Salary double 37 0 37 53788.26 94494.58 ... 56508.32 57562.36
#> 5 JobSat character 35 2 3 med low high ... high low high
#> 6 Plan integer 37 0 3 1 1 2 ... 2 2 1
#> 7 Pre integer 37 0 27 82 62 90 ... 83 59 80
#> 8 Post integer 37 0 22 92 74 86 ... 90 71 87
#> ------------------------------------------------------------------------------------------
d is the default name of the data frame for the
lessR data analysis functions. Explicitly access the
data frame with the data
parameter in the analysis
functions.
As an option, also read the table of variable labels. Create the
table formatted as two columns. The first column is the variable name
and the second column is the corresponding variable label. Not all
variables need be entered into the table. The table can be a
csv
file or an Excel file.
Read the file of variable labels into the l data frame, currently the only permitted name. The labels will be displayed on both the text and visualization output. Each displayed label is the variable name juxtaposed with the corresponding label, as shown in the following output.
l <- rd("Employee_lbl")
#>
#> >>> Suggestions
#> Recommended binary format for data files: feather
#> Create with Write(d, "your_file", format="feather")
#> More details about your data, Enter: details() for d, or details(name)
#>
#> Data Types
#> ------------------------------------------------------------
#> character: Non-numeric data values
#> ------------------------------------------------------------
#>
#> Variable Missing Unique
#> Name Type Values Values Values First and last values
#> ------------------------------------------------------------------------------------------
#> 1 label character 8 0 8 Time of Company Employment ... Test score on legal issues after instruction
#> ------------------------------------------------------------------------------------------
l
#> label
#> Years Time of Company Employment
#> Gender Man or Woman
#> Dept Department Employed
#> Salary Annual Salary (USD)
#> JobSat Satisfaction with Work Environment
#> Plan 1=GoodHealth, 2=GetWell, 3=BestCare
#> Pre Test score on legal issues before instruction
#> Post Test score on legal issues after instruction
Consider the categorical variable Dept in the Employee data table.
Use BarChart()
to tabulate and display the visualization of
the number of employees in each department, here relying upon the
default data frame (table) named d. Otherwise add the
data=
option for a data frame with another name.
#> >>> Suggestions
#> BarChart(Dept, horiz=TRUE) # horizontal bar chart
#> BarChart(Dept, fill="reds") # red bars of varying lightness
#> PieChart(Dept) # doughnut (ring) chart
#> Plot(Dept) # bubble plot
#> Plot(Dept, stat="count") # lollipop plot
#>
#> --- Dept ---
#>
#> Missing Values: 1
#>
#> ACCT ADMN FINC MKTG SALE Total
#> Frequencies: 5 6 4 6 15 36
#> Proportions: 0.139 0.167 0.111 0.167 0.417 1.000
#>
#> Chi-squared test of null hypothesis of equal probabilities
#> Chisq = 10.944, df = 4, p-value = 0.027
Specify a single fill color with the fill
parameter, the
edge color of the bars with color
. Set the transparency
level with transparency
. Against a lighter background,
display the value for each bar with a darker color using the
labels_color
parameter. To specify a color, use color
names, specify a color with either its rgb()
or
hcl()
color space coordinates, or use the
lessR custom color palette function
getColors()
.
#> >>> Suggestions
#> BarChart(Dept, horiz=TRUE) # horizontal bar chart
#> BarChart(Dept, fill="reds") # red bars of varying lightness
#> PieChart(Dept) # doughnut (ring) chart
#> Plot(Dept) # bubble plot
#> Plot(Dept, stat="count") # lollipop plot
#>
#> --- Dept ---
#>
#> Missing Values: 1
#>
#> ACCT ADMN FINC MKTG SALE Total
#> Frequencies: 5 6 4 6 15 36
#> Proportions: 0.139 0.167 0.111 0.167 0.417 1.000
#>
#> Chi-squared test of null hypothesis of equal probabilities
#> Chisq = 10.944, df = 4, p-value = 0.027
Use the theme
parameter to change the entire color
theme: ācolorsā, ālightbronzeā, ādodgerblueā, āslateredā, ādarkredā,
āgrayā, āgoldā, ādarkgreenā, āblueā, āredā, āroseā, āgreenā, āpurpleā,
āsiennaā, ābrownā, āorangeā, āwhiteā, and ālightā. In this example,
changing the full theme accomplishes the same as changing the fill
color. Turn off the displayed value on each bar with the parameter
labels
set to off
. Specify a horizontal bar
chart with base R parameter horiz
.
#> >>> Suggestions
#> BarChart(Dept, horiz=TRUE) # horizontal bar chart
#> BarChart(Dept, fill="reds") # red bars of varying lightness
#> PieChart(Dept) # doughnut (ring) chart
#> Plot(Dept) # bubble plot
#> Plot(Dept, stat="count") # lollipop plot
#>
#> --- Dept ---
#>
#> Missing Values: 1
#>
#> ACCT ADMN FINC MKTG SALE Total
#> Frequencies: 5 6 4 6 15 36
#> Proportions: 0.139 0.167 0.111 0.167 0.417 1.000
#>
#> Chi-squared test of null hypothesis of equal probabilities
#> Chisq = 10.944, df = 4, p-value = 0.027
Consider the continuous variable Salary in the Employee data
table. Use Histogram()
to tabulate and display the number
of employees in each department, here relying upon the default data
frame (table) named d, so the data=
parameter is
not needed.
#> >>> Suggestions
#> bin_width: set the width of each bin
#> bin_start: set the start of the first bin
#> bin_end: set the end of the last bin
#> Histogram(Salary, density=TRUE) # smoothed curve + histogram
#> Plot(Salary) # Violin/Box/Scatterplot (VBS) plot
#>
#> --- Salary ---
#>
#> n miss mean sd min mdn max
#> 37 0 73795.557 21799.533 46124.970 69547.600 134419.230
#>
#>
#> --- Outliers --- from the box plot: 1
#>
#> Small Large
#> ----- -----
#> 134419.2
#>
#>
#> Bin Width: 10000
#> Number of Bins: 10
#>
#> Bin Midpnt Count Prop Cumul.c Cumul.p
#> ---------------------------------------------------------
#> 40000 > 50000 45000 4 0.11 4 0.11
#> 50000 > 60000 55000 8 0.22 12 0.32
#> 60000 > 70000 65000 8 0.22 20 0.54
#> 70000 > 80000 75000 5 0.14 25 0.68
#> 80000 > 90000 85000 3 0.08 28 0.76
#> 90000 > 100000 95000 5 0.14 33 0.89
#> 100000 > 110000 105000 1 0.03 34 0.92
#> 110000 > 120000 115000 1 0.03 35 0.95
#> 120000 > 130000 125000 1 0.03 36 0.97
#> 130000 > 140000 135000 1 0.03 37 1.00
By default, the Histogram()
function provides a color
theme according to the current, active theme. The function also provides
the corresponding frequency distribution, summary statistics, the table
that lists the count of each category, from which the histogram is
constructed, as well as an outlier analysis based on Tukeyās outlier
detection rules for box plots.
Use the parameters bin_start
, bin_width
,
and bin_end
to customize the histogram.
Easy to change the color, either by changing the color theme with
style()
, or just change the fill color with
fill
. Can refer to standard R colors, as shown with
lessR function showColors()
, or implicitly
invoke the lessR color palette generating function
getColors()
. Each 30 degrees of the color wheel is named,
such as "greens"
, "rusts"
, etc, and implements
a sequential color palette.
#> >>> Suggestions
#> bin_end: set the end of the last bin
#> Histogram(Salary, density=TRUE) # smoothed curve + histogram
#> Plot(Salary) # Violin/Box/Scatterplot (VBS) plot
#>
#> --- Salary ---
#>
#> n miss mean sd min mdn max
#> 37 0 73795.557 21799.533 46124.970 69547.600 134419.230
#>
#>
#> --- Outliers --- from the box plot: 1
#>
#> Small Large
#> ----- -----
#> 134419.2
#>
#>
#> Bin Width: 14000
#> Number of Bins: 8
#>
#> Bin Midpnt Count Prop Cumul.c Cumul.p
#> ---------------------------------------------------------
#> 35000 > 49000 42000 1 0.03 1 0.03
#> 49000 > 63000 56000 14 0.38 15 0.41
#> 63000 > 77000 70000 9 0.24 24 0.65
#> 77000 > 91000 84000 4 0.11 28 0.76
#> 91000 > 105000 98000 5 0.14 33 0.89
#> 105000 > 119000 112000 2 0.05 35 0.95
#> 119000 > 133000 126000 1 0.03 36 0.97
#> 133000 > 147000 140000 1 0.03 37 1.00
Specify an X and Y variable with the plot function to obtain a scatter plot. For two variables, both variables can be any combination of continuous or categorical. One variable can also be specified. A scatterplot of two categorical variables yields a bubble plot. Below is a scatterplot of two continuous variables.
#> >>> Suggestions or enter: style(suggest=FALSE)
#> Plot(Years, Salary, enhance=TRUE) # many options
#> Plot(Years, Salary, fill="skyblue") # interior fill color of points
#> Plot(Years, Salary, fit="lm", fit_se=c(.90,.99)) # fit line, stnd errors
#> Plot(Years, Salary, out_cut=.10) # label top 10% from center as outliers
#>
#>
#> >>> Pearson's product-moment correlation
#>
#> Years: Time of Company Employment
#> Salary: Annual Salary (USD)
#>
#> Number of paired values with neither missing, n = 36
#> Sample Correlation of Years and Salary: r = 0.852
#>
#> Hypothesis Test of 0 Correlation: t = 9.501, df = 34, p-value = 0.000
#> 95% Confidence Interval for Correlation: 0.727 to 0.923
#>
Enhance the default scatterplot with parameter enhance
.
The visualization includes the mean of each variable indicated by the
respective line through the scatterplot, the 95% confidence ellipse,
labeled outliers, least-squares regression line with 95% confidence
interval, and the corresponding regression line with the outliers
removed.
Plot(Years, Salary, enhance=TRUE)
#> [Ellipse with Murdoch and Chow's function ellipse from their ellipse package]
#>
#> >>> Suggestions or enter: style(suggest=FALSE)
#> Plot(Years, Salary, color="red") # exterior edge color of points
#> Plot(Years, Salary, fit="lm", fit_se=c(.90,.99)) # fit line, stnd errors
#> Plot(Years, Salary, MD_cut=6) # Mahalanobis distance from center > 6 is an outlier
#>
#> >>> Outlier analysis with Mahalanobis Distance
#>
#> MD ID
#> ----- -----
#> 8.14 18
#> 7.84 34
#>
#> 5.63 31
#> 5.58 19
#> 3.75 4
#> ... ...
#>
#>
#> >>> Pearson's product-moment correlation
#>
#> Years: Time of Company Employment
#> Salary: Annual Salary (USD)
#>
#> Number of paired values with neither missing, n = 36
#> Sample Correlation of Years and Salary: r = 0.852
#>
#> Hypothesis Test of 0 Correlation: t = 9.501, df = 34, p-value = 0.000
#> 95% Confidence Interval for Correlation: 0.727 to 0.923
#>
The default plot for a single continuous variable includes not only the scatterplot, but also the superimposed violin plot and box plot, with outliers identified. Call this plot the VBS plot.
Plot(Salary)
#> [Violin/Box/Scatterplot graphics from Deepayan Sarkar's lattice package]
#>
#> >>> Suggestions
#> Plot(Salary, out_cut=2, fences=TRUE, vbs_mean=TRUE) # Label two outliers ...
#> Plot(Salary, box_adj=TRUE) # Adjust boxplot whiskers for asymmetry
#> --- Salary ---
#> Present: 37
#> Missing: 0
#> Total : 37
#>
#> Mean : 73795.557
#> Stnd Dev : 21799.533
#> IQR : 31012.560
#> Skew : 0.190 [medcouple, -1 to 1]
#>
#> Minimum : 46124.970
#> Lower Whisker: 46124.970
#> 1st Quartile : 56772.950
#> Median : 69547.600
#> 3rd Quartile : 87785.510
#> Upper Whisker: 122563.380
#> Maximum : 134419.230
#>
#>
#> --- Outliers --- from the box plot: 1
#>
#> Small Large
#> ----- -----
#> 134419.23
#>
#> Number of duplicated values: 0
#>
#> Parameter values (can be manually set)
#> -------------------------------------------------------
#> size: 0.61 size of plotted points
#> out_size: 0.82 size of plotted outlier points
#> jitter_y: 0.45 random vertical movement of points
#> jitter_x: 0.00 random horizontal movement of points
#> bw: 9529.04 set bandwidth higher for smoother edges
Following is a scatterplot in the form of a bubble plot for two categorical variables.
#> >>> Suggestions or enter: style(suggest=FALSE)
#> Plot(x=JobSat, y=Gender, radius=0.15) # smaller bubbles
#>
#> Some Parameter values (can be manually set)
#> -------------------------------------------------------
#> radius: 0.22 size of largest bubble
#> power: 0.50 relative bubble sizes
The full output is extensive: Summary of the analysis, estimated model, fit indices, ANOVA, correlation matrix, collinearity analysis, best subset regression, residuals and influence statistics, and prediction intervals. The motivation is to provide virtually all of the information needed for a proper regression analysis.
#> >>> Suggestion
#> # Create an R markdown file for interpretative output with Rmd = "file_name"
#> reg(Salary ~ Years + Pre, Rmd="eg")
#>
#>
#> BACKGROUND
#>
#> Data Frame: d
#>
#> Response Variable: Salary
#> Predictor Variable 1: Years
#> Predictor Variable 2: Pre
#>
#> Number of cases (rows) of data: 37
#> Number of cases retained for analysis: 36
#>
#>
#> BASIC ANALYSIS
#>
#> Estimate Std Err t-value p-value Lower 95% Upper 95%
#> (Intercept) 44140.971 13666.115 3.230 0.003 16337.052 71944.891
#> Years 3251.408 347.529 9.356 0.000 2544.355 3958.462
#> Pre -18.265 167.652 -0.109 0.914 -359.355 322.825
#>
#> Standard deviation of Salary: 21,822.372
#>
#> Standard deviation of residuals: 11,753.478 for df=33
#> 95% range of residuals: 47,825.260 = 2 * (2.035 * 11,753.478)
#>
#> R-squared: 0.726 Adjusted R-squared: 0.710 PRESS R-squared: 0.659
#>
#> Null hypothesis of all 0 population slope coefficients:
#> F-statistic: 43.827 df: 2 and 33 p-value: 0.000
#>
#> -- Analysis of Variance
#>
#> df Sum Sq Mean Sq F-value p-value
#> Years 1 12107157290.292 12107157290.292 87.641 0.000
#> Pre 1 1639658.444 1639658.444 0.012 0.914
#>
#> Model 2 12108796948.736 6054398474.368 43.827 0.000
#> Residuals 33 4558759843.773 138144237.690
#> Salary 35 16667556792.508 476215908.357
#>
#>
#> K-FOLD CROSS-VALIDATION
#>
#>
#> RELATIONS AMONG THE VARIABLES
#>
#> Salary Years Pre
#> Salary 1.00 0.85 0.03
#> Years 0.85 1.00 0.05
#> Pre 0.03 0.05 1.00
#>
#> Tolerance VIF
#> Years 0.998 1.002
#> Pre 0.998 1.002
#>
#> Years Pre R2adj X's
#> 1 0 0.718 1
#> 1 1 0.710 2
#> 0 1 -0.028 1
#>
#> [based on Thomas Lumley's leaps function from the leaps package]
#>
#>
#> RESIDUALS AND INFLUENCE
#>
#> -- Data, Fitted, Residual, Studentized Residual, Dffits, Cook's Distance
#> [sorted by Cook's Distance]
#> [n_res_rows = 20, out of 36 rows of data, or do n_res_rows="all"]
#> -----------------------------------------------------------------------------------------
#> Years Pre Salary fitted resid rstdnt dffits cooks
#> Correll, Trevon 21 97 134419.230 110648.843 23770.387 2.424 1.217 0.430
#> James, Leslie 18 70 122563.380 101387.773 21175.607 1.998 0.714 0.156
#> Capelle, Adam 24 83 108138.430 120658.778 -12520.348 -1.211 -0.634 0.132
#> Hoang, Binh 15 96 111074.860 91158.659 19916.201 1.860 0.649 0.131
#> Korhalkar, Jessica 2 74 72502.500 49292.181 23210.319 2.171 0.638 0.122
#> Billing, Susan 4 91 72675.260 55484.493 17190.767 1.561 0.472 0.071
#> Singh, Niral 2 59 61055.440 49566.155 11489.285 1.064 0.452 0.068
#> Skrotzki, Sara 18 63 91352.330 101515.627 -10163.297 -0.937 -0.397 0.053
#> Saechao, Suzanne 8 98 55545.250 68362.271 -12817.021 -1.157 -0.390 0.050
#> Kralik, Laura 10 74 92681.190 75303.447 17377.743 1.535 0.287 0.026
#> Anastasiou, Crystal 2 59 56508.320 49566.155 6942.165 0.636 0.270 0.025
#> Langston, Matthew 5 94 49188.960 58681.106 -9492.146 -0.844 -0.268 0.024
#> Afshari, Anbar 6 100 69441.930 61822.925 7619.005 0.689 0.264 0.024
#> Cassinelli, Anastis 10 80 57562.360 75193.857 -17631.497 -1.554 -0.265 0.022
#> Osterman, Pascal 5 69 49704.790 59137.730 -9432.940 -0.826 -0.216 0.016
#> Bellingar, Samantha 10 67 66337.830 75431.301 -9093.471 -0.793 -0.198 0.013
#> LaRoe, Maria 10 80 61961.290 75193.857 -13232.567 -1.148 -0.195 0.013
#> Ritchie, Darnell 7 82 53788.260 65403.102 -11614.842 -1.006 -0.190 0.012
#> Sheppard, Cory 14 66 95027.550 88455.199 6572.351 0.579 0.176 0.011
#> Downs, Deborah 7 90 57139.900 65256.982 -8117.082 -0.706 -0.174 0.010
#>
#>
#> PREDICTION ERROR
#>
#> -- Data, Predicted, Standard Error of Prediction, 95% Prediction Intervals
#> [sorted by lower bound of prediction interval]
#> [to see all intervals add n_pred_rows="all"]
#> ----------------------------------------------
#>
#> Years Pre Salary pred s_pred pi.lwr pi.upr width
#> Hamide, Bita 1 83 51036.850 45876.388 12290.483 20871.211 70881.564 50010.352
#> Singh, Niral 2 59 61055.440 49566.155 12619.291 23892.014 75240.296 51348.281
#> Anastasiou, Crystal 2 59 56508.320 49566.155 12619.291 23892.014 75240.296 51348.281
#> ...
#> Link, Thomas 10 83 66312.890 75139.062 11933.518 50860.137 99417.987 48557.849
#> LaRoe, Maria 10 80 61961.290 75193.857 11918.048 50946.405 99441.308 48494.903
#> Cassinelli, Anastis 10 80 57562.360 75193.857 11918.048 50946.405 99441.308 48494.903
#> ...
#> Correll, Trevon 21 97 134419.230 110648.843 12881.876 84440.470 136857.217 52416.747
#> Capelle, Adam 24 83 108138.430 120658.778 12955.608 94300.394 147017.161 52716.767
#>
#> ----------------------------------
#> Plot 1: Distribution of Residuals
#> Plot 2: Residuals vs Fitted Values
#> ----------------------------------
The time series plot, plotting the values of a variable cross time,
is a special case of a scatterplot, potentially with the points of size
0 with adjacent points connected by a line segment. Indicate a time
series by specifying the x
-variable, the first variable
listed, as a variable of type Date
. This conversion to
Date
data values occurs automatically for dates specified
in a digital format, such as 18/8/2024
or related formats
plus examples such as 2024 Q3
or 2024 Aug
.
Otherwise, explicitly use the R function as.Date()
or pass
the date format directly with the time_format
parameter.
In this StockPrice
data file, the date conversion as
already been done.
d <- Read("StockPrice")
#>
#> >>> Suggestions
#> Recommended binary format for data files: feather
#> Create with Write(d, "your_file", format="feather")
#> More details about your data, Enter: details() for d, or details(name)
#>
#> Data Types
#> ------------------------------------------------------------
#> character: Non-numeric data values
#> Date: Date with year, month and day
#> double: Numeric data values with decimal digits
#> ------------------------------------------------------------
#>
#> Variable Missing Unique
#> Name Type Values Values Values First and last values
#> ------------------------------------------------------------------------------------------
#> 1 Month Date 1419 0 473 1985-01-01 ... 2024-05-01
#> 2 Company character 1419 0 3 Apple Apple ... Intel Intel
#> 3 Price double 1419 0 1400 0.100055 0.085392 ... 30.346739 30.555891
#> 4 Volume double 1419 0 1419 6366416000 ... 229147100
#> ------------------------------------------------------------------------------------------
head(d)
#> Month Company Price Volume
#> 1 1985-01-01 Apple 0.100055 6366416000
#> 2 1985-02-01 Apple 0.085392 4733388800
#> 3 1985-03-01 Apple 0.076335 4615587200
#> 4 1985-04-01 Apple 0.073316 2868028800
#> 5 1985-05-01 Apple 0.059947 4639129600
#> 6 1985-06-01 Apple 0.062103 5811388800
We have the date as Month, and also have variables Company and stock Price.
d <- Read("StockPrice")
#>
#> >>> Suggestions
#> Recommended binary format for data files: feather
#> Create with Write(d, "your_file", format="feather")
#> More details about your data, Enter: details() for d, or details(name)
#>
#> Data Types
#> ------------------------------------------------------------
#> character: Non-numeric data values
#> Date: Date with year, month and day
#> double: Numeric data values with decimal digits
#> ------------------------------------------------------------
#>
#> Variable Missing Unique
#> Name Type Values Values Values First and last values
#> ------------------------------------------------------------------------------------------
#> 1 Month Date 1419 0 473 1985-01-01 ... 2024-05-01
#> 2 Company character 1419 0 3 Apple Apple ... Intel Intel
#> 3 Price double 1419 0 1400 0.100055 0.085392 ... 30.346739 30.555891
#> 4 Volume double 1419 0 1419 6366416000 ... 229147100
#> ------------------------------------------------------------------------------------------
Plot(Month, Price, filter=(Company=="Apple"), area_fill="on")
#>
#> filter: (Company == "Apple")
#> -----
#> Rows of data before filtering: 1419
#> Rows of data after filtering: 473
#> >>> Suggestions or enter: style(suggest=FALSE)
#> Plot(Month, Price, time_ahead=4) # exponential smoothing forecast 4 time units
#> Plot(Month, Price, time_unit="years") # aggregate time by yearly sum
#> Plot(Month, Price, time_unit="years", time_agg="mean") # aggregate by yearly mean
With the by
parameter, plot all three companies on the
same panel.
#> >>> Suggestions or enter: style(suggest=FALSE)
#> Plot(Month, Price, time_ahead=4) # exponential smoothing forecast 4 time units
#> Plot(Month, Price, time_unit="years") # aggregate time by yearly sum
#> Plot(Month, Price, time_unit="years", time_agg="mean") # aggregate by yearly mean
Here, aggregate the mean by time, from months to quarters.
Plot(Month, Price, time_unit="quarters", time_agg="mean")
#> >>> Warning
#> The Date variable is not sorted in Increasing Order.
#>
#> For a data frame named d, enter:
#> d <- sort_by(d, Month)
#> Maybe you have a by variable with repeating Date values?
#> Enter ?sort_by for more information and examples.
#> [with functions from Ryan, Ulrich, Bennett, and Joy's xts package]
#> >>> Suggestions or enter: style(suggest=FALSE)
#> Plot(Month, Price, time_ahead=4) # exponential smoothing forecast 4 time units
Plot()
implements exponential smoothing forecasting with
accompanying visualization. Parameters include time_ahead
for the number of time_units
to forecast into the future,
and time_format
to provide a specific format for the date
variable if not detected correctly by default. Control aspects of the
exponential smoothing estimation and prediction algorithms with
parameters es_level
(alpha), es_trend
(beta),
es_seasons
(gamma), es_type
for additive or
multiplicative seasonality, and es_PIlevel
for the level of
the prediction intervals.
To forecast Appleās stock price, focus here on the last several years of the data, beginning with Row 400 through Row 473, the last row of data for apple. In this example, forecast ahead 24 months.
#> >>> Suggestions or enter: style(suggest=FALSE)
#> Plot(Month, Price, time_ahead=4, es_seasons=FALSE) # turn off exponential smoothing seasonal effect
#>
#>
#> Sum of squared fit errors: 7,753.55081
#> Mean squared fit error: 131.41612
#>
#> Coefficients for Linear Trend and Seasonality
#> b0: 180.34701 b1: 1.07749
#> s1: 3.32827 s2: 8.71712 s3: 3.53090 s4: -7.06310 s5: 2.66831 s6: 8.68311
#> s7: 2.96602 s8: -0.12209 s9: -5.21794 s10: -1.88430 s11: 0.44367 s12: 2.05299
#>
#> Smoothing Parameters
#> alpha: 0.79971 beta: 0.00077 gamma: 1.00000
#>
#> predicted upper95 lower95
#> Jun 2024 184.7528 206.6612 162.8443
#> Jul 2024 191.2191 219.2801 163.1582
#> Aug 2024 187.1104 220.2061 154.0147
#> Sep 2024 177.5939 215.0600 140.1277
#> Oct 2024 188.4028 229.7860 147.0195
#> Nov 2024 195.4951 240.4607 150.5294
#> Dec 2024 190.8555 239.1433 142.5676
#> Jan 2025 188.8448 240.2453 137.4444
#> Feb 2025 184.8265 239.1659 130.4871
#> Mar 2025 189.2376 246.3691 132.1061
#> Apr 2025 192.6431 252.4405 132.8457
#> May 2025 195.3299 257.6831 132.9767
#> Jun 2025 197.6827 263.8267 131.5386
#> Jul 2025 204.1490 272.6194 135.6786
#> Aug 2025 200.0403 270.7638 129.3167
#> Sep 2025 190.5237 263.4342 117.6133
#> Oct 2025 201.3327 276.3694 126.2959
#> Nov 2025 208.4249 285.5326 131.3173
#> Dec 2025 203.7853 282.9127 124.6580
#> Jan 2026 201.7747 282.8745 120.6750
#> Feb 2026 197.7564 280.7845 114.7282
#> Mar 2026 202.1675 287.0831 117.2518
#> Apr 2026 205.5730 292.3378 118.8081
#> May 2026 208.2598 296.8380 119.6816
Aggregate with pivot()
. Any function that processes a
single vector of data, such as a column of data values for a variable in
a data frame, and outputs a single computed value, the statistic, can be
passed to pivot()
. Functions can be user-defined or
built-in.
Here, compute the mean and standard deviation of each company in the StockPrice data set download it with lessR.
d <- Read("StockPrice", quiet=TRUE)
pivot(d, c(mean, sd), Price, by=Company)
#> Company Price_n Price_na Price_mean Price_sd
#> 1 Apple 473 0 23.157 46.248
#> 2 IBM 473 0 60.010 43.547
#> 3 Intel 473 0 16.725 14.689
Interpret this call to pivot()
as
Select any two of the three possibilities for multiple parameter values: Multiple compute functions, multiple variables over which to compute, and multiple categorical variables by which to define groups for aggregation.