| ml-params |
Spark ML - ML Params |
| ml-persistence |
Spark ML - Model Persistence |
| ml-transform-methods |
Spark ML - Transform, fit, and predict methods (ml_ interface) |
| ml-tuning |
Spark ML - Tuning |
| ml_aft_survival_regression |
Spark ML - Survival Regression |
| ml_als |
Spark ML - ALS |
| ml_als_tidiers |
Tidying methods for Spark ML ALS |
| ml_approx_nearest_neighbors |
Utility functions for LSH models |
| ml_approx_similarity_join |
Utility functions for LSH models |
| ml_association_rules |
Frequent Pattern Mining - FPGrowth |
| ml_binary_classification_eval |
Spark ML - Evaluators |
| ml_binary_classification_evaluator |
Spark ML - Evaluators |
| ml_bisecting_kmeans |
Spark ML - Bisecting K-Means Clustering |
| ml_chisquare_test |
Chi-square hypothesis testing for categorical data. |
| ml_classification_eval |
Spark ML - Evaluators |
| ml_clustering_evaluator |
Spark ML - Clustering Evaluator |
| ml_compute_cost |
Spark ML - K-Means Clustering |
| ml_compute_silhouette_measure |
Spark ML - K-Means Clustering |
| ml_corr |
Compute correlation matrix |
| ml_cross_validator |
Spark ML - Tuning |
| ml_decision_tree |
Spark ML - Decision Trees |
| ml_decision_tree_classifier |
Spark ML - Decision Trees |
| ml_decision_tree_regressor |
Spark ML - Decision Trees |
| ml_default_stop_words |
Default stop words |
| ml_describe_topics |
Spark ML - Latent Dirichlet Allocation |
| ml_evaluate |
Evaluate the Model on a Validation Set |
| ml_evaluate.ml_evaluator |
Evaluate the Model on a Validation Set |
| ml_evaluate.ml_generalized_linear_regression_model |
Evaluate the Model on a Validation Set |
| ml_evaluate.ml_linear_regression_model |
Evaluate the Model on a Validation Set |
| ml_evaluate.ml_logistic_regression_model |
Evaluate the Model on a Validation Set |
| ml_evaluate.ml_model_classification |
Evaluate the Model on a Validation Set |
| ml_evaluate.ml_model_clustering |
Evaluate the Model on a Validation Set |
| ml_evaluate.ml_model_generalized_linear_regression |
Evaluate the Model on a Validation Set |
| ml_evaluate.ml_model_linear_regression |
Evaluate the Model on a Validation Set |
| ml_evaluate.ml_model_logistic_regression |
Evaluate the Model on a Validation Set |
| ml_evaluator |
Spark ML - Evaluators |
| ml_feature_importances |
Spark ML - Feature Importance for Tree Models |
| ml_find_synonyms |
Feature Transformation - Word2Vec (Estimator) |
| ml_fit |
Spark ML - Transform, fit, and predict methods (ml_ interface) |
| ml_fit.default |
Spark ML - Transform, fit, and predict methods (ml_ interface) |
| ml_fit_and_transform |
Spark ML - Transform, fit, and predict methods (ml_ interface) |
| ml_fpgrowth |
Frequent Pattern Mining - FPGrowth |
| ml_freq_itemsets |
Frequent Pattern Mining - FPGrowth |
| ml_freq_seq_patterns |
Frequent Pattern Mining - PrefixSpan |
| ml_gaussian_mixture |
Spark ML - Gaussian Mixture clustering. |
| ml_gbt_classifier |
Spark ML - Gradient Boosted Trees |
| ml_gbt_regressor |
Spark ML - Gradient Boosted Trees |
| ml_generalized_linear_regression |
Spark ML - Generalized Linear Regression |
| ml_glm_tidiers |
Tidying methods for Spark ML linear models |
| ml_gradient_boosted_trees |
Spark ML - Gradient Boosted Trees |
| ml_isotonic_regression |
Spark ML - Isotonic Regression |
| ml_isotonic_regression_tidiers |
Tidying methods for Spark ML Isotonic Regression |
| ml_is_set |
Spark ML - ML Params |
| ml_kmeans |
Spark ML - K-Means Clustering |
| ml_kmeans_cluster_eval |
Evaluate a K-mean clustering |
| ml_labels |
Feature Transformation - StringIndexer (Estimator) |
| ml_lda |
Spark ML - Latent Dirichlet Allocation |
| ml_lda_tidiers |
Tidying methods for Spark ML LDA models |
| ml_linear_regression |
Spark ML - Linear Regression |
| ml_linear_svc |
Spark ML - LinearSVC |
| ml_linear_svc_tidiers |
Tidying methods for Spark ML linear svc |
| ml_load |
Spark ML - Model Persistence |
| ml_logistic_regression |
Spark ML - Logistic Regression |
| ml_logistic_regression_tidiers |
Tidying methods for Spark ML Logistic Regression |
| ml_log_likelihood |
Spark ML - Latent Dirichlet Allocation |
| ml_log_perplexity |
Spark ML - Latent Dirichlet Allocation |
| ml_metrics_binary |
Extracts metrics from a fitted table |
| ml_metrics_multiclass |
Extracts metrics from a fitted table |
| ml_metrics_regression |
Extracts metrics from a fitted table |
| ml_model_data |
Extracts data associated with a Spark ML model |
| ml_multiclass_classification_evaluator |
Spark ML - Evaluators |
| ml_multilayer_perceptron |
Spark ML - Multilayer Perceptron |
| ml_multilayer_perceptron_classifier |
Spark ML - Multilayer Perceptron |
| ml_multilayer_perceptron_tidiers |
Tidying methods for Spark ML MLP |
| ml_naive_bayes |
Spark ML - Naive-Bayes |
| ml_naive_bayes_tidiers |
Tidying methods for Spark ML Naive Bayes |
| ml_one_vs_rest |
Spark ML - OneVsRest |
| ml_param |
Spark ML - ML Params |
| ml_params |
Spark ML - ML Params |
| ml_param_map |
Spark ML - ML Params |
| ml_pca |
Feature Transformation - PCA (Estimator) |
| ml_pca_tidiers |
Tidying methods for Spark ML Principal Component Analysis |
| ml_pipeline |
Spark ML - Pipelines |
| ml_power_iteration |
Spark ML - Power Iteration Clustering |
| ml_predict |
Spark ML - Transform, fit, and predict methods (ml_ interface) |
| ml_predict.ml_model_classification |
Spark ML - Transform, fit, and predict methods (ml_ interface) |
| ml_prefixspan |
Frequent Pattern Mining - PrefixSpan |
| ml_random_forest |
Spark ML - Random Forest |
| ml_random_forest_classifier |
Spark ML - Random Forest |
| ml_random_forest_regressor |
Spark ML - Random Forest |
| ml_recommend |
Spark ML - ALS |
| ml_regression_evaluator |
Spark ML - Evaluators |
| ml_save |
Spark ML - Model Persistence |
| ml_save.ml_model |
Spark ML - Model Persistence |
| ml_stage |
Spark ML - Pipeline stage extraction |
| ml_stages |
Spark ML - Pipeline stage extraction |
| ml_sub_models |
Spark ML - Tuning |
| ml_summary |
Spark ML - Extraction of summary metrics |
| ml_survival_regression |
Spark ML - Survival Regression |
| ml_survival_regression_tidiers |
Tidying methods for Spark ML Survival Regression |
| ml_topics_matrix |
Spark ML - Latent Dirichlet Allocation |
| ml_train_validation_split |
Spark ML - Tuning |
| ml_transform |
Spark ML - Transform, fit, and predict methods (ml_ interface) |
| ml_tree_feature_importance |
Spark ML - Feature Importance for Tree Models |
| ml_tree_tidiers |
Tidying methods for Spark ML tree models |
| ml_uid |
Spark ML - UID |
| ml_unsupervised_tidiers |
Tidying methods for Spark ML unsupervised models |
| ml_validation_metrics |
Spark ML - Tuning |
| ml_vocabulary |
Feature Transformation - CountVectorizer (Estimator) |
| mutate |
Mutate |
| sdf-saveload |
Save / Load a Spark DataFrame |
| sdf-transform-methods |
Spark ML - Transform, fit, and predict methods (sdf_ interface) |
| sdf_along |
Create DataFrame for along Object |
| sdf_bind |
Bind multiple Spark DataFrames by row and column |
| sdf_bind_cols |
Bind multiple Spark DataFrames by row and column |
| sdf_bind_rows |
Bind multiple Spark DataFrames by row and column |
| sdf_broadcast |
Broadcast hint |
| sdf_checkpoint |
Checkpoint a Spark DataFrame |
| sdf_coalesce |
Coalesces a Spark DataFrame |
| sdf_collect |
Collect a Spark DataFrame into R. |
| sdf_copy_to |
Copy an Object into Spark |
| sdf_crosstab |
Cross Tabulation |
| sdf_debug_string |
Debug Info for Spark DataFrame |
| sdf_describe |
Compute summary statistics for columns of a data frame |
| sdf_dim |
Support for Dimension Operations |
| sdf_distinct |
Invoke distinct on a Spark DataFrame |
| sdf_drop_duplicates |
Remove duplicates from a Spark DataFrame |
| sdf_expand_grid |
Create a Spark dataframe containing all combinations of inputs |
| sdf_fit |
Spark ML - Transform, fit, and predict methods (sdf_ interface) |
| sdf_fit_and_transform |
Spark ML - Transform, fit, and predict methods (sdf_ interface) |
| sdf_from_avro |
Convert column(s) from avro format |
| sdf_import |
Copy an Object into Spark |
| sdf_is_streaming |
Spark DataFrame is Streaming |
| sdf_last_index |
Returns the last index of a Spark DataFrame |
| sdf_len |
Create DataFrame for Length |
| sdf_load_parquet |
Save / Load a Spark DataFrame |
| sdf_load_table |
Save / Load a Spark DataFrame |
| sdf_ncol |
Support for Dimension Operations |
| sdf_nrow |
Support for Dimension Operations |
| sdf_num_partitions |
Gets number of partitions of a Spark DataFrame |
| sdf_partition |
Partition a Spark Dataframe |
| sdf_partition_sizes |
Compute the number of records within each partition of a Spark DataFrame |
| sdf_persist |
Persist a Spark DataFrame |
| sdf_pivot |
Pivot a Spark DataFrame |
| sdf_predict |
Spark ML - Transform, fit, and predict methods (sdf_ interface) |
| sdf_project |
Project features onto principal components |
| sdf_quantile |
Compute (Approximate) Quantiles with a Spark DataFrame |
| sdf_random_split |
Partition a Spark Dataframe |
| sdf_rbeta |
Generate random samples from a Beta distribution |
| sdf_rbinom |
Generate random samples from a binomial distribution |
| sdf_rcauchy |
Generate random samples from a Cauchy distribution |
| sdf_rchisq |
Generate random samples from a chi-squared distribution |
| sdf_read_column |
Read a Column from a Spark DataFrame |
| sdf_register |
Register a Spark DataFrame |
| sdf_repartition |
Repartition a Spark DataFrame |
| sdf_residuals |
Model Residuals |
| sdf_residuals.ml_model_generalized_linear_regression |
Model Residuals |
| sdf_residuals.ml_model_linear_regression |
Model Residuals |
| sdf_rexp |
Generate random samples from an exponential distribution |
| sdf_rgamma |
Generate random samples from a Gamma distribution |
| sdf_rgeom |
Generate random samples from a geometric distribution |
| sdf_rhyper |
Generate random samples from a hypergeometric distribution |
| sdf_rlnorm |
Generate random samples from a log normal distribution |
| sdf_rnorm |
Generate random samples from the standard normal distribution |
| sdf_rpois |
Generate random samples from a Poisson distribution |
| sdf_rt |
Generate random samples from a t-distribution |
| sdf_runif |
Generate random samples from the uniform distribution U(0, 1). |
| sdf_rweibull |
Generate random samples from a Weibull distribution. |
| sdf_sample |
Randomly Sample Rows from a Spark DataFrame |
| sdf_save_parquet |
Save / Load a Spark DataFrame |
| sdf_save_table |
Save / Load a Spark DataFrame |
| sdf_schema |
Read the Schema of a Spark DataFrame |
| sdf_separate_column |
Separate a Vector Column into Scalar Columns |
| sdf_seq |
Create DataFrame for Range |
| sdf_sort |
Sort a Spark DataFrame |
| sdf_sql |
Spark DataFrame from SQL |
| sdf_to_avro |
Convert column(s) to avro format |
| sdf_transform |
Spark ML - Transform, fit, and predict methods (sdf_ interface) |
| sdf_unnest_longer |
Unnest longer |
| sdf_unnest_wider |
Unnest wider |
| sdf_weighted_sample |
Perform Weighted Random Sampling on a Spark DataFrame |
| sdf_with_sequential_id |
Add a Sequential ID Column to a Spark DataFrame |
| sdf_with_unique_id |
Add a Unique ID Column to a Spark DataFrame |
| select |
Select |
| separate |
Separate |
| spark-api |
Access the Spark API |
| spark-connections |
Manage Spark Connections |
| sparklyr_get_backend_port |
Return the port number of a 'sparklyr' backend. |
| spark_adaptive_query_execution |
Retrieves or sets status of Spark AQE |
| spark_advisory_shuffle_partition_size |
Retrieves or sets advisory size of the shuffle partition |
| spark_apply |
Apply an R Function in Spark |
| spark_apply_bundle |
Create Bundle for Spark Apply |
| spark_apply_log |
Log Writer for Spark Apply |
| spark_auto_broadcast_join_threshold |
Retrieves or sets the auto broadcast join threshold |
| spark_available_versions |
Download and install various versions of Spark |
| spark_coalesce_initial_num_partitions |
Retrieves or sets initial number of shuffle partitions before coalescing |
| spark_coalesce_min_num_partitions |
Retrieves or sets the minimum number of shuffle partitions after coalescing |
| spark_coalesce_shuffle_partitions |
Retrieves or sets whether coalescing contiguous shuffle partitions is enabled |
| spark_compilation_spec |
Define a Spark Compilation Specification |
| spark_config |
Read Spark Configuration |
| spark_config_kubernetes |
Kubernetes Configuration |
| spark_config_settings |
Retrieve Available Settings |
| spark_connect |
Manage Spark Connections |
| spark_connection |
Retrieve the Spark Connection Associated with an R Object |
| spark_connection-class |
spark_connection class |
| spark_connection_find |
Find Spark Connection |
| spark_connection_is_open |
Manage Spark Connections |
| spark_connect_method |
Function that negotiates the connection with the Spark back-end |
| spark_context |
Access the Spark API |
| spark_context_config |
Runtime configuration interface for the Spark Context. |
| spark_dataframe |
Retrieve a Spark DataFrame |
| spark_default_compilation_spec |
Default Compilation Specification for Spark Extensions |
| spark_dependency |
Define a Spark dependency |
| spark_dependency_fallback |
Fallback to Spark Dependency |
| spark_disconnect |
Manage Spark Connections |
| spark_disconnect_all |
Manage Spark Connections |
| spark_extension |
Create Spark Extension |
| spark_get_checkpoint_dir |
Set/Get Spark checkpoint directory |
| spark_home_set |
Set the SPARK_HOME environment variable |
| spark_ide_columns |
Set of functions to provide integration with the RStudio IDE |
| spark_ide_connection_actions |
Set of functions to provide integration with the RStudio IDE |
| spark_ide_connection_closed |
Set of functions to provide integration with the RStudio IDE |
| spark_ide_connection_open |
Set of functions to provide integration with the RStudio IDE |
| spark_ide_connection_updated |
Set of functions to provide integration with the RStudio IDE |
| spark_ide_objects |
Set of functions to provide integration with the RStudio IDE |
| spark_ide_preview |
Set of functions to provide integration with the RStudio IDE |
| spark_insert_table |
Inserts a Spark DataFrame into a Spark table |
| spark_install |
Download and install various versions of Spark |
| spark_installed_versions |
Download and install various versions of Spark |
| spark_install_dir |
Download and install various versions of Spark |
| spark_install_tar |
Download and install various versions of Spark |
| spark_integ_test_skip |
It lets the package know if it should test a particular functionality or not |
| spark_jobj |
Retrieve a Spark JVM Object Reference |
| spark_jobj-class |
spark_jobj class |
| spark_last_error |
Surfaces the last error from Spark captured by internal 'spark_error' function |
| spark_load_table |
Reads from a Spark Table into a Spark DataFrame. |
| spark_log |
View Entries in the Spark Log |
| spark_read |
Read file(s) into a Spark DataFrame using a custom reader |
| spark_read_avro |
Read Apache Avro data into a Spark DataFrame. |
| spark_read_binary |
Read binary data into a Spark DataFrame. |
| spark_read_csv |
Read a CSV file into a Spark DataFrame |
| spark_read_delta |
Read from Delta Lake into a Spark DataFrame. |
| spark_read_image |
Read image data into a Spark DataFrame. |
| spark_read_jdbc |
Read from JDBC connection into a Spark DataFrame. |
| spark_read_json |
Read a JSON file into a Spark DataFrame |
| spark_read_libsvm |
Read libsvm file into a Spark DataFrame. |
| spark_read_orc |
Read a ORC file into a Spark DataFrame |
| spark_read_parquet |
Read a Parquet file into a Spark DataFrame |
| spark_read_source |
Read from a generic source into a Spark DataFrame. |
| spark_read_table |
Reads from a Spark Table into a Spark DataFrame. |
| spark_read_text |
Read a Text file into a Spark DataFrame |
| spark_save_table |
Saves a Spark DataFrame as a Spark table |
| spark_session |
Access the Spark API |
| spark_session_config |
Runtime configuration interface for the Spark Session |
| spark_set_checkpoint_dir |
Set/Get Spark checkpoint directory |
| spark_statistical_routines |
Generate random samples from some distribution |
| spark_submit |
Manage Spark Connections |
| spark_table_name |
Generate a Table Name from Expression |
| spark_uninstall |
Download and install various versions of Spark |
| spark_version |
Get the Spark Version Associated with a Spark Connection |
| spark_version_from_home |
Get the Spark Version Associated with a Spark Installation |
| spark_web |
Open the Spark web interface |
| spark_write |
Write Spark DataFrame to file using a custom writer |
| spark_write_avro |
Serialize a Spark DataFrame into Apache Avro format |
| spark_write_csv |
Write a Spark DataFrame to a CSV |
| spark_write_delta |
Writes a Spark DataFrame into Delta Lake |
| spark_write_jdbc |
Writes a Spark DataFrame into a JDBC table |
| spark_write_json |
Write a Spark DataFrame to a JSON file |
| spark_write_orc |
Write a Spark DataFrame to a ORC file |
| spark_write_parquet |
Write a Spark DataFrame to a Parquet file |
| spark_write_rds |
Write Spark DataFrame to RDS files |
| spark_write_source |
Writes a Spark DataFrame into a generic source |
| spark_write_table |
Writes a Spark DataFrame into a Spark table |
| spark_write_text |
Write a Spark DataFrame to a Text file |
| src_databases |
Show database list |
| stream_find |
Find Stream |
| stream_generate_test |
Generate Test Stream |
| stream_id |
Spark Stream's Identifier |
| stream_lag |
Apply lag function to columns of a Spark Streaming DataFrame |
| stream_name |
Spark Stream's Name |
| stream_read_cloudfiles |
Read files created by the stream |
| stream_read_csv |
Read files created by the stream |
| stream_read_delta |
Read files created by the stream |
| stream_read_json |
Read files created by the stream |
| stream_read_kafka |
Read files created by the stream |
| stream_read_orc |
Read files created by the stream |
| stream_read_parquet |
Read files created by the stream |
| stream_read_socket |
Read files created by the stream |
| stream_read_table |
Read files created by the stream |
| stream_read_text |
Read files created by the stream |
| stream_render |
Render Stream |
| stream_stats |
Stream Statistics |
| stream_stop |
Stops a Spark Stream |
| stream_trigger_continuous |
Spark Stream Continuous Trigger |
| stream_trigger_interval |
Spark Stream Interval Trigger |
| stream_view |
View Stream |
| stream_watermark |
Watermark Stream |
| stream_write_console |
Write files to the stream |
| stream_write_csv |
Write files to the stream |
| stream_write_delta |
Write files to the stream |
| stream_write_json |
Write files to the stream |
| stream_write_kafka |
Write files to the stream |
| stream_write_memory |
Write Memory Stream |
| stream_write_orc |
Write files to the stream |
| stream_write_parquet |
Write files to the stream |
| stream_write_table |
Write Stream to Table |
| stream_write_text |
Write files to the stream |