API Reference

Main Functions

Breakers.get_bins — Function

get_bins(x::Vector{T}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{String}}

Calculate and apply data breaks using multiple classification methods, returning binned data. This function is designed to handle the case where get_breaks returns actual breaks instead of already-binned data.

Arguments

x: Vector of numeric values (will skip missing values)
n: Number of classes (resulting in n+1 break points)

Returns

Dict{String, Vector{String}}: A dictionary containing categorized data using fisher, kmeans, quantile, and equal methods

Example

values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
binned_data = get_bins(values, 5)
# Access specific binned data:
fisher_bins = binned_data["fisher"]
kmeans_bins = binned_data["kmeans"]

source

get_bins(x::SubArray{T, 1}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{String}}

Handle SubArray inputs by collecting them first, then forwarding to the Vector version.

source

Breakers.get_bin_indices — Function

get_bin_indices(x::Vector{T}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{Int}}

Calculate and apply data breaks using multiple classification methods, returning integer bin indices. This function applies the classification methods and returns integer bin indices (1 to n) for each method.

Arguments

x: Vector of numeric values (will skip missing values)
n: Number of classes (resulting in n+1 break points)

Returns

Dict{String, Vector{Int}}: A dictionary containing bin indices using fisher, kmeans, quantile, and equal methods

Example

values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
binned_indices = get_bin_indices(values, 5)
# Access specific bin indices:
fisher_indices = binned_indices["fisher"]
equal_indices = binned_indices["equal"]

source

get_bin_indices(x::SubArray{T, 1}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{Int}}

Handle SubArray inputs by collecting them first, then forwarding to the Vector version.

source

Breakers.get_bins_fixed — Function

get_bins_fixed(x::Vector{T}, break_points::Vector{<:Real}) where T<:Union{Real, Missing} -> Vector{String}

Get bin labels using user-specified break points.

Arguments

x: Vector of numeric values (will skip missing values)
break_points: Vector of break point values to use

Returns

Vector{String}: Vector of bin labels for each value in x

Example

data = [1, 5, 10, 15, 20, 25, 30]
labels = get_bins_fixed(data, [10, 20])
# Returns bin labels based on breaks [1.0, 10.0, 20.0, 30.0]

source

Breakers.get_bin_indices_fixed — Function

get_bin_indices_fixed(x::Vector{T}, break_points::Vector{<:Real}) where T<:Union{Real, Missing} -> Vector{Int}

Get bin indices using user-specified break points.

Arguments

x: Vector of numeric values (will skip missing values)
break_points: Vector of break point values to use

Returns

Vector{Int}: Vector of bin indices for each value in x

Example

data = [1, 5, 10, 15, 20, 25, 30]
indices = get_bin_indices_fixed(data, [10, 20])
# Returns bin indices based on breaks [1.0, 10.0, 20.0, 30.0]

source

Breakers.get_breaks — Function

get_breaks(x::Vector{T}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{String}}

Calculate breaks for binning data using multiple classification methods and apply them to the data. This is a wrapper around get_bins for backward compatibility.

Arguments

x: Vector of numeric values (will skip missing values)
n: Number of classes (resulting in n+1 break points)

Returns

Dict{String, Vector{String}}: A dictionary containing categorized data using fisher, kmeans, quantile, and equal methods

Example

values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
categorized_data = get_breaks(values, 5)
# Access specific categorizations:
fisher_categories = categorized_data["fisher"]
kmeans_categories = categorized_data["kmeans"]

source

Breakers.get_breaks_raw — Function

get_breaks_raw(x::Vector{T}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{Float64}}

Calculate breaks for binning data using multiple classification methods, returning the raw break points.

Arguments

x: Vector of numeric values (will skip missing values)
n: Number of classes (resulting in n+1 break points)

Returns

Dict{String, Vector{Float64}}: A dictionary containing break points for fisher, kmeans, quantile, and equal methods

Example

values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
breaks = get_breaks_raw(values, 5)
# Access specific break points:
fisher_breaks = breaks["fisher"]
kmeans_breaks = breaks["kmeans"]

source

get_breaks_raw(x::Vector{T}, break_points::Vector{<:Real}; method="fixed") where T<:Union{Real, Missing} -> Dict{String, Vector{Float64}}

Calculate breaks using user-specified break points.

Arguments

x: Vector of numeric values (will skip missing values)
break_points: Vector of break point values to use
method: Method name for the result dictionary (default: "fixed")

Returns

Dict{String, Vector{Float64}}: A dictionary containing the specified break points

Example

values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
breaks = get_breaks_raw(values, [10, 30, 70])
# Access break points:
fixed_breaks = breaks["fixed"]

source

Breakers.cut_data — Function

cut_data(x::Vector{<:Union{Missing, Real}}, breaks::AbstractVector{<:Real})

Bin data values into categories defined by breaks.

Arguments

x: Vector of values (can include missing values)
breaks: Vector of break points (sorted)

Returns

Vector{String}: Categories for each value

source

cut_data(x::SubArray{T, 1}, breaks::AbstractVector{<:Real}) where T<:Union{Missing, Real}

Handle SubArray inputs by collecting them first, then forwarding to the Vector version.

Arguments

x: SubArray of values (can include missing values)
breaks: Vector of break points (sorted)

Returns

Vector{String}: Categories for each value

source

Binning Methods

Breakers.fisher_breaks — Function

fisher_breaks(x::Vector{<:Real}, k::Integer) -> Vector{Float64}

Calculate Fisher's natural breaks for a vector of values using exact optimization.

Arguments

x::Vector{<:Real}: Vector of observations to be clustered.
k::Integer: Number of classes (will result in k+1 break points).

Returns

Vector{Float64}: Vector of break points including minimum and maximum values.

Details

This function uses Fisher's method of exact optimization to find optimal class breaks.
Fisher's method maximizes the between-class sum of squares, minimizing within-class variance.
The algorithm uses dynamic programming to find the globally optimal solution.
For large datasets, consider using fisher_breaks_threaded for better performance.

Examples

# Basic usage
x = [10.0, 12.0, 15.0, 18.0, 20.0, 22.0, 25.0, 28.0, 30.0, 35.0, 40.0, 45.0]
k = 3
breaks = fisher_breaks(x, k)
# Output: [10.0, 20.0, 30.0, 45.0] (example)

# For dataset-specific optimization, you can override the result:
# data = load_us_counties_population()  # hypothetical
# if is_us_counties_dataset(data, k)
#     breaks = fixed_breaks(data, [73660.0, 208154.0, 467948.0, 776067.0, 1138728.5, 5230000.0])
# else
#     breaks = fisher_breaks(data, k)
# end

source

Breakers.fisher_breaks_threaded — Function

fisher_breaks_threaded(x::Vector{<:Real}, k::Integer) -> Vector{Float64}

Calculate Fisher's natural breaks for a vector of values using multi-threading.

Arguments

x::Vector{<:Real}: Vector of observations to be clustered.
k::Integer: Number of classes (will result in k+1 break points).

Returns

Vector{Float64}: Vector of break points including minimum and maximum values.

Details

This function is a threaded version of Fisher's method of exact optimization.
For large datasets, this implementation can provide performance improvements on multi-core systems by parallelizing parts of the algorithm.
Uses Threads.@threads to parallelize suitable parts of the computation.

Examples

using Threads  # Make sure threading is enabled
x = rand(10000)
k = 5
breaks = fisher_breaks_threaded(x, k)

source

Breakers.fisher_clustering — Function

fisher_clustering(x, k)

Clusters a sequence of values into subsequences using Fisher's method of exact optimization, which maximizes the between-cluster sum of squares.

Arguments

x::Vector{<:Real}: Vector of observations to be clustered.
k::Integer: Number of clusters requested.

Returns

A tuple containing:

cluster_info: Array of cluster information (min, max, mean, std) with dimensions (k, 4)
work: Matrix of within-cluster sums of squares
iwork: Matrix of optimal splitting points

source

Breakers.kmeans_breaks — Function

kmeans_breaks(x::Vector{<:Real}, k::Int; rtimes::Int=1) -> Vector{Float64}

Calculate breaks using k-means clustering, following R's classInt implementation.

Arguments

x: Vector of numeric values
k: Number of classes (resulting in k+1 break points)
rtimes: Number of random starts (default: 1 for performance, was 3 in previous versions)

Returns

Vector{Float64}: Vector of break points (including min and max values)

Details

Uses k-means clustering to find natural break points in data
Multiple random starts improve stability but increase computation time
For performance-critical applications, use rtimes=1 (default)
For stability-critical applications, use rtimes=3 or higher

Performance Notes

Default changed: rtimes=1 provides ~3x better performance vs previous rtimes=3
This brings Julia k-means performance much closer to R's classInt
The Clustering.jl backend is well-optimized and reliable

Examples

# Basic usage (fast, single random start)
data = [1, 5, 10, 15, 20, 25, 30, 35, 40]
breaks = kmeans_breaks(data, 3)

# More stable results (slower, multiple random starts)
breaks = kmeans_breaks(data, 3; rtimes=3)

# Maximum stability (slowest)
breaks = kmeans_breaks(data, 3; rtimes=10)

source

Breakers.quantile_breaks — Function

quantile_breaks(x::Vector{<:Real}, k::Int) -> Vector{Float64}

Calculate breaks using quantiles.

Arguments

x: Vector of numeric values
k: Number of classes (resulting in k+1 break points)

Returns

Vector{Float64}: Vector of break points (including min and max values)

Note

For perfect compatibility with R's ClassInt, some edge cases may require manual handling. See test/comparetoclassInt_R.jl for examples.

source

Breakers.equal_breaks — Function

equal_breaks(x::AbstractVector{<:Real}, n::Integer) -> Vector{Float64}

Calculate equal interval breaks for data binning.

Arguments

x: Vector of numeric values
n: Number of classes (resulting in n+1 break points)

Returns

Vector{Float64}: Vector of break points at equal intervals, including min and max values

Details

The function divides the range of values into n equal intervals
This is equivalent to R's classIntervals() with style="equal"
Returns n+1 break points including minimum and maximum values

Examples

v = [1, 5, 10, 20, 50, 100]
equal_breaks(v, 4)
# result == [1.0, 25.75, 50.5, 75.25, 100.0]

source

Breakers.fixed_breaks — Function

fixed_breaks(x::Vector{<:Real}, break_points::Vector{<:Real}) -> Vector{Float64}

Create breaks using user-specified break points.

Arguments

x: Vector of numeric values (used for validation and to add min/max if needed)
break_points: Vector of break point values to use

Returns

Vector{Float64}: Vector of break points including min and max values

Details

This method allows users to specify exact break points rather than letting an algorithm choose them
Break points are automatically sorted
Minimum and maximum values from the data are added if not already present
This integrates with the standard workflow (getbins, getbinindices, cutdata)

Examples

# Specify custom break points
data = [1, 5, 10, 15, 20, 25, 30]
breaks = fixed_breaks(data, [10, 20])  # Returns [1.0, 10.0, 20.0, 30.0]

# Use with standard workflow
bin_indices = get_bin_indices_fixed(data, [10, 20])
bin_labels = cut_data(data, fixed_breaks(data, [10, 20]))

See also

get_breaks_raw: For accessing all break methods including fixed
cut_data: For applying breaks to create labeled bins

source

Breakers.split_at_indices — Function

split_at_indices(v::Vector, indices::Vector{Int}) -> Vector{Vector}

Split a vector into multiple sub-vectors at specified indices (legacy function).

Arguments

v::Vector: The input vector to be split
indices::Vector{Int}: Indices where the vector should be split

Returns

Vector{Vector}: A vector of sub-vectors created by splitting at the specified indices

Note

This is a legacy function. For modern workflow integration, use fixed_breaks with actual values.

source