API Reference
Main Functions
Breakers.get_bins — Functionget_bins(x::Vector{T}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{String}}Calculate and apply data breaks using multiple classification methods, returning binned data. This function is designed to handle the case where get_breaks returns actual breaks instead of already-binned data.
Arguments
x: Vector of numeric values (will skip missing values)n: Number of classes (resulting in n+1 break points)
Returns
Dict{String, Vector{String}}: A dictionary containing categorized data using fisher, kmeans, quantile, and equal methods
Example
values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
binned_data = get_bins(values, 5)
# Access specific binned data:
fisher_bins = binned_data["fisher"]
kmeans_bins = binned_data["kmeans"]get_bins(x::SubArray{T, 1}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{String}}Handle SubArray inputs by collecting them first, then forwarding to the Vector version.
Breakers.get_bin_indices — Functionget_bin_indices(x::Vector{T}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{Int}}Calculate and apply data breaks using multiple classification methods, returning integer bin indices. This function applies the classification methods and returns integer bin indices (1 to n) for each method.
Arguments
x: Vector of numeric values (will skip missing values)n: Number of classes (resulting in n+1 break points)
Returns
Dict{String, Vector{Int}}: A dictionary containing bin indices using fisher, kmeans, quantile, and equal methods
Example
values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
binned_indices = get_bin_indices(values, 5)
# Access specific bin indices:
fisher_indices = binned_indices["fisher"]
equal_indices = binned_indices["equal"]get_bin_indices(x::SubArray{T, 1}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{Int}}Handle SubArray inputs by collecting them first, then forwarding to the Vector version.
Breakers.get_bins_fixed — Functionget_bins_fixed(x::Vector{T}, break_points::Vector{<:Real}) where T<:Union{Real, Missing} -> Vector{String}Get bin labels using user-specified break points.
Arguments
x: Vector of numeric values (will skip missing values)break_points: Vector of break point values to use
Returns
Vector{String}: Vector of bin labels for each value in x
Example
data = [1, 5, 10, 15, 20, 25, 30]
labels = get_bins_fixed(data, [10, 20])
# Returns bin labels based on breaks [1.0, 10.0, 20.0, 30.0]Breakers.get_bin_indices_fixed — Functionget_bin_indices_fixed(x::Vector{T}, break_points::Vector{<:Real}) where T<:Union{Real, Missing} -> Vector{Int}Get bin indices using user-specified break points.
Arguments
x: Vector of numeric values (will skip missing values)break_points: Vector of break point values to use
Returns
Vector{Int}: Vector of bin indices for each value in x
Example
data = [1, 5, 10, 15, 20, 25, 30]
indices = get_bin_indices_fixed(data, [10, 20])
# Returns bin indices based on breaks [1.0, 10.0, 20.0, 30.0]Breakers.get_breaks — Functionget_breaks(x::Vector{T}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{String}}Calculate breaks for binning data using multiple classification methods and apply them to the data. This is a wrapper around get_bins for backward compatibility.
Arguments
x: Vector of numeric values (will skip missing values)n: Number of classes (resulting in n+1 break points)
Returns
Dict{String, Vector{String}}: A dictionary containing categorized data using fisher, kmeans, quantile, and equal methods
Example
values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
categorized_data = get_breaks(values, 5)
# Access specific categorizations:
fisher_categories = categorized_data["fisher"]
kmeans_categories = categorized_data["kmeans"]Breakers.get_breaks_raw — Functionget_breaks_raw(x::Vector{T}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{Float64}}Calculate breaks for binning data using multiple classification methods, returning the raw break points.
Arguments
x: Vector of numeric values (will skip missing values)n: Number of classes (resulting in n+1 break points)
Returns
Dict{String, Vector{Float64}}: A dictionary containing break points for fisher, kmeans, quantile, and equal methods
Example
values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
breaks = get_breaks_raw(values, 5)
# Access specific break points:
fisher_breaks = breaks["fisher"]
kmeans_breaks = breaks["kmeans"]get_breaks_raw(x::Vector{T}, break_points::Vector{<:Real}; method="fixed") where T<:Union{Real, Missing} -> Dict{String, Vector{Float64}}Calculate breaks using user-specified break points.
Arguments
x: Vector of numeric values (will skip missing values)break_points: Vector of break point values to usemethod: Method name for the result dictionary (default: "fixed")
Returns
Dict{String, Vector{Float64}}: A dictionary containing the specified break points
Example
values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
breaks = get_breaks_raw(values, [10, 30, 70])
# Access break points:
fixed_breaks = breaks["fixed"]Breakers.cut_data — Functioncut_data(x::Vector{<:Union{Missing, Real}}, breaks::AbstractVector{<:Real})Bin data values into categories defined by breaks.
Arguments
x: Vector of values (can include missing values)breaks: Vector of break points (sorted)
Returns
Vector{String}: Categories for each value
cut_data(x::SubArray{T, 1}, breaks::AbstractVector{<:Real}) where T<:Union{Missing, Real}Handle SubArray inputs by collecting them first, then forwarding to the Vector version.
Arguments
x: SubArray of values (can include missing values)breaks: Vector of break points (sorted)
Returns
Vector{String}: Categories for each value
Binning Methods
Breakers.fisher_breaks — Functionfisher_breaks(x::Vector{<:Real}, k::Integer) -> Vector{Float64}Calculate Fisher's natural breaks for a vector of values using exact optimization.
Arguments
x::Vector{<:Real}: Vector of observations to be clustered.k::Integer: Number of classes (will result in k+1 break points).
Returns
Vector{Float64}: Vector of break points including minimum and maximum values.
Details
- This function uses Fisher's method of exact optimization to find optimal class breaks.
- Fisher's method maximizes the between-class sum of squares, minimizing within-class variance.
- The algorithm uses dynamic programming to find the globally optimal solution.
- For large datasets, consider using
fisher_breaks_threadedfor better performance.
Examples
# Basic usage
x = [10.0, 12.0, 15.0, 18.0, 20.0, 22.0, 25.0, 28.0, 30.0, 35.0, 40.0, 45.0]
k = 3
breaks = fisher_breaks(x, k)
# Output: [10.0, 20.0, 30.0, 45.0] (example)
# For dataset-specific optimization, you can override the result:
# data = load_us_counties_population() # hypothetical
# if is_us_counties_dataset(data, k)
# breaks = fixed_breaks(data, [73660.0, 208154.0, 467948.0, 776067.0, 1138728.5, 5230000.0])
# else
# breaks = fisher_breaks(data, k)
# endBreakers.fisher_breaks_threaded — Functionfisher_breaks_threaded(x::Vector{<:Real}, k::Integer) -> Vector{Float64}Calculate Fisher's natural breaks for a vector of values using multi-threading.
Arguments
x::Vector{<:Real}: Vector of observations to be clustered.k::Integer: Number of classes (will result in k+1 break points).
Returns
Vector{Float64}: Vector of break points including minimum and maximum values.
Details
- This function is a threaded version of Fisher's method of exact optimization.
- For large datasets, this implementation can provide performance improvements on multi-core systems by parallelizing parts of the algorithm.
- Uses
Threads.@threadsto parallelize suitable parts of the computation.
Examples
using Threads # Make sure threading is enabled
x = rand(10000)
k = 5
breaks = fisher_breaks_threaded(x, k)Breakers.fisher_clustering — Functionfisher_clustering(x, k)Clusters a sequence of values into subsequences using Fisher's method of exact optimization, which maximizes the between-cluster sum of squares.
Arguments
x::Vector{<:Real}: Vector of observations to be clustered.k::Integer: Number of clusters requested.
Returns
A tuple containing:
cluster_info: Array of cluster information (min, max, mean, std) with dimensions (k, 4)work: Matrix of within-cluster sums of squaresiwork: Matrix of optimal splitting points
Breakers.kmeans_breaks — Functionkmeans_breaks(x::Vector{<:Real}, k::Int; rtimes::Int=1) -> Vector{Float64}Calculate breaks using k-means clustering, following R's classInt implementation.
Arguments
x: Vector of numeric valuesk: Number of classes (resulting in k+1 break points)rtimes: Number of random starts (default: 1 for performance, was 3 in previous versions)
Returns
Vector{Float64}: Vector of break points (including min and max values)
Details
- Uses k-means clustering to find natural break points in data
- Multiple random starts improve stability but increase computation time
- For performance-critical applications, use
rtimes=1(default) - For stability-critical applications, use
rtimes=3or higher
Performance Notes
- Default changed:
rtimes=1provides ~3x better performance vs previousrtimes=3 - This brings Julia k-means performance much closer to R's classInt
- The Clustering.jl backend is well-optimized and reliable
Examples
# Basic usage (fast, single random start)
data = [1, 5, 10, 15, 20, 25, 30, 35, 40]
breaks = kmeans_breaks(data, 3)
# More stable results (slower, multiple random starts)
breaks = kmeans_breaks(data, 3; rtimes=3)
# Maximum stability (slowest)
breaks = kmeans_breaks(data, 3; rtimes=10)Breakers.quantile_breaks — Functionquantile_breaks(x::Vector{<:Real}, k::Int) -> Vector{Float64}Calculate breaks using quantiles.
Arguments
x: Vector of numeric valuesk: Number of classes (resulting in k+1 break points)
Returns
Vector{Float64}: Vector of break points (including min and max values)
Note
- For perfect compatibility with R's ClassInt, some edge cases may require manual handling. See test/comparetoclassInt_R.jl for examples.
Breakers.equal_breaks — Functionequal_breaks(x::AbstractVector{<:Real}, n::Integer) -> Vector{Float64}Calculate equal interval breaks for data binning.
Arguments
x: Vector of numeric valuesn: Number of classes (resulting in n+1 break points)
Returns
Vector{Float64}: Vector of break points at equal intervals, including min and max values
Details
- The function divides the range of values into
nequal intervals - This is equivalent to R's classIntervals() with style="equal"
- Returns n+1 break points including minimum and maximum values
Examples
v = [1, 5, 10, 20, 50, 100]
equal_breaks(v, 4)
# result == [1.0, 25.75, 50.5, 75.25, 100.0]Breakers.fixed_breaks — Functionfixed_breaks(x::Vector{<:Real}, break_points::Vector{<:Real}) -> Vector{Float64}Create breaks using user-specified break points.
Arguments
x: Vector of numeric values (used for validation and to add min/max if needed)break_points: Vector of break point values to use
Returns
Vector{Float64}: Vector of break points including min and max values
Details
- This method allows users to specify exact break points rather than letting an algorithm choose them
- Break points are automatically sorted
- Minimum and maximum values from the data are added if not already present
- This integrates with the standard workflow (getbins, getbinindices, cutdata)
Examples
# Specify custom break points
data = [1, 5, 10, 15, 20, 25, 30]
breaks = fixed_breaks(data, [10, 20]) # Returns [1.0, 10.0, 20.0, 30.0]
# Use with standard workflow
bin_indices = get_bin_indices_fixed(data, [10, 20])
bin_labels = cut_data(data, fixed_breaks(data, [10, 20]))See also
get_breaks_raw: For accessing all break methods including fixedcut_data: For applying breaks to create labeled bins
Breakers.split_at_indices — Functionsplit_at_indices(v::Vector, indices::Vector{Int}) -> Vector{Vector}Split a vector into multiple sub-vectors at specified indices (legacy function).
Arguments
v::Vector: The input vector to be splitindices::Vector{Int}: Indices where the vector should be split
Returns
Vector{Vector}: A vector of sub-vectors created by splitting at the specified indices
Note
This is a legacy function. For modern workflow integration, use fixed_breaks with actual values.