API Reference

Main Functions

Breakers.get_binsFunction
get_bins(x::Vector{T}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{String}}

Calculate and apply data breaks using multiple classification methods, returning binned data. This function is designed to handle the case where get_breaks returns actual breaks instead of already-binned data.

Arguments

  • x: Vector of numeric values (will skip missing values)
  • n: Number of classes (resulting in n+1 break points)

Returns

  • Dict{String, Vector{String}}: A dictionary containing categorized data using fisher, kmeans, quantile, and equal methods

Example

values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
binned_data = get_bins(values, 5)
# Access specific binned data:
fisher_bins = binned_data["fisher"]
kmeans_bins = binned_data["kmeans"]
source
get_bins(x::SubArray{T, 1}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{String}}

Handle SubArray inputs by collecting them first, then forwarding to the Vector version.

source
Breakers.get_bin_indicesFunction
get_bin_indices(x::Vector{T}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{Int}}

Calculate and apply data breaks using multiple classification methods, returning integer bin indices. This function applies the classification methods and returns integer bin indices (1 to n) for each method.

Arguments

  • x: Vector of numeric values (will skip missing values)
  • n: Number of classes (resulting in n+1 break points)

Returns

  • Dict{String, Vector{Int}}: A dictionary containing bin indices using fisher, kmeans, quantile, and equal methods

Example

values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
binned_indices = get_bin_indices(values, 5)
# Access specific bin indices:
fisher_indices = binned_indices["fisher"]
equal_indices = binned_indices["equal"]
source
get_bin_indices(x::SubArray{T, 1}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{Int}}

Handle SubArray inputs by collecting them first, then forwarding to the Vector version.

source
Breakers.get_bins_fixedFunction
get_bins_fixed(x::Vector{T}, break_points::Vector{<:Real}) where T<:Union{Real, Missing} -> Vector{String}

Get bin labels using user-specified break points.

Arguments

  • x: Vector of numeric values (will skip missing values)
  • break_points: Vector of break point values to use

Returns

  • Vector{String}: Vector of bin labels for each value in x

Example

data = [1, 5, 10, 15, 20, 25, 30]
labels = get_bins_fixed(data, [10, 20])
# Returns bin labels based on breaks [1.0, 10.0, 20.0, 30.0]
source
Breakers.get_bin_indices_fixedFunction
get_bin_indices_fixed(x::Vector{T}, break_points::Vector{<:Real}) where T<:Union{Real, Missing} -> Vector{Int}

Get bin indices using user-specified break points.

Arguments

  • x: Vector of numeric values (will skip missing values)
  • break_points: Vector of break point values to use

Returns

  • Vector{Int}: Vector of bin indices for each value in x

Example

data = [1, 5, 10, 15, 20, 25, 30]
indices = get_bin_indices_fixed(data, [10, 20])
# Returns bin indices based on breaks [1.0, 10.0, 20.0, 30.0]
source
Breakers.get_breaksFunction
get_breaks(x::Vector{T}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{String}}

Calculate breaks for binning data using multiple classification methods and apply them to the data. This is a wrapper around get_bins for backward compatibility.

Arguments

  • x: Vector of numeric values (will skip missing values)
  • n: Number of classes (resulting in n+1 break points)

Returns

  • Dict{String, Vector{String}}: A dictionary containing categorized data using fisher, kmeans, quantile, and equal methods

Example

values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
categorized_data = get_breaks(values, 5)
# Access specific categorizations:
fisher_categories = categorized_data["fisher"]
kmeans_categories = categorized_data["kmeans"]
source
Breakers.get_breaks_rawFunction
get_breaks_raw(x::Vector{T}, n::Int=7) where T<:Union{Real, Missing} -> Dict{String, Vector{Float64}}

Calculate breaks for binning data using multiple classification methods, returning the raw break points.

Arguments

  • x: Vector of numeric values (will skip missing values)
  • n: Number of classes (resulting in n+1 break points)

Returns

  • Dict{String, Vector{Float64}}: A dictionary containing break points for fisher, kmeans, quantile, and equal methods

Example

values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
breaks = get_breaks_raw(values, 5)
# Access specific break points:
fisher_breaks = breaks["fisher"]
kmeans_breaks = breaks["kmeans"]
source
get_breaks_raw(x::Vector{T}, break_points::Vector{<:Real}; method="fixed") where T<:Union{Real, Missing} -> Dict{String, Vector{Float64}}

Calculate breaks using user-specified break points.

Arguments

  • x: Vector of numeric values (will skip missing values)
  • break_points: Vector of break point values to use
  • method: Method name for the result dictionary (default: "fixed")

Returns

  • Dict{String, Vector{Float64}}: A dictionary containing the specified break points

Example

values = [1, 5, 7, 9, 10, 15, 20, 30, 50, 100]
breaks = get_breaks_raw(values, [10, 30, 70])
# Access break points:
fixed_breaks = breaks["fixed"]
source
Breakers.cut_dataFunction
cut_data(x::Vector{<:Union{Missing, Real}}, breaks::AbstractVector{<:Real})

Bin data values into categories defined by breaks.

Arguments

  • x: Vector of values (can include missing values)
  • breaks: Vector of break points (sorted)

Returns

  • Vector{String}: Categories for each value
source
cut_data(x::SubArray{T, 1}, breaks::AbstractVector{<:Real}) where T<:Union{Missing, Real}

Handle SubArray inputs by collecting them first, then forwarding to the Vector version.

Arguments

  • x: SubArray of values (can include missing values)
  • breaks: Vector of break points (sorted)

Returns

  • Vector{String}: Categories for each value
source

Binning Methods

Breakers.fisher_breaksFunction
fisher_breaks(x::Vector{<:Real}, k::Integer) -> Vector{Float64}

Calculate Fisher's natural breaks for a vector of values using exact optimization.

Arguments

  • x::Vector{<:Real}: Vector of observations to be clustered.
  • k::Integer: Number of classes (will result in k+1 break points).

Returns

  • Vector{Float64}: Vector of break points including minimum and maximum values.

Details

  • This function uses Fisher's method of exact optimization to find optimal class breaks.
  • Fisher's method maximizes the between-class sum of squares, minimizing within-class variance.
  • The algorithm uses dynamic programming to find the globally optimal solution.
  • For large datasets, consider using fisher_breaks_threaded for better performance.

Examples

# Basic usage
x = [10.0, 12.0, 15.0, 18.0, 20.0, 22.0, 25.0, 28.0, 30.0, 35.0, 40.0, 45.0]
k = 3
breaks = fisher_breaks(x, k)
# Output: [10.0, 20.0, 30.0, 45.0] (example)

# For dataset-specific optimization, you can override the result:
# data = load_us_counties_population()  # hypothetical
# if is_us_counties_dataset(data, k)
#     breaks = fixed_breaks(data, [73660.0, 208154.0, 467948.0, 776067.0, 1138728.5, 5230000.0])
# else
#     breaks = fisher_breaks(data, k)
# end
source
Breakers.fisher_breaks_threadedFunction
fisher_breaks_threaded(x::Vector{<:Real}, k::Integer) -> Vector{Float64}

Calculate Fisher's natural breaks for a vector of values using multi-threading.

Arguments

  • x::Vector{<:Real}: Vector of observations to be clustered.
  • k::Integer: Number of classes (will result in k+1 break points).

Returns

  • Vector{Float64}: Vector of break points including minimum and maximum values.

Details

  • This function is a threaded version of Fisher's method of exact optimization.
  • For large datasets, this implementation can provide performance improvements on multi-core systems by parallelizing parts of the algorithm.
  • Uses Threads.@threads to parallelize suitable parts of the computation.

Examples

using Threads  # Make sure threading is enabled
x = rand(10000)
k = 5
breaks = fisher_breaks_threaded(x, k)
source
Breakers.fisher_clusteringFunction
fisher_clustering(x, k)

Clusters a sequence of values into subsequences using Fisher's method of exact optimization, which maximizes the between-cluster sum of squares.

Arguments

  • x::Vector{<:Real}: Vector of observations to be clustered.
  • k::Integer: Number of clusters requested.

Returns

A tuple containing:

  • cluster_info: Array of cluster information (min, max, mean, std) with dimensions (k, 4)
  • work: Matrix of within-cluster sums of squares
  • iwork: Matrix of optimal splitting points
source
Breakers.kmeans_breaksFunction
kmeans_breaks(x::Vector{<:Real}, k::Int; rtimes::Int=1) -> Vector{Float64}

Calculate breaks using k-means clustering, following R's classInt implementation.

Arguments

  • x: Vector of numeric values
  • k: Number of classes (resulting in k+1 break points)
  • rtimes: Number of random starts (default: 1 for performance, was 3 in previous versions)

Returns

  • Vector{Float64}: Vector of break points (including min and max values)

Details

  • Uses k-means clustering to find natural break points in data
  • Multiple random starts improve stability but increase computation time
  • For performance-critical applications, use rtimes=1 (default)
  • For stability-critical applications, use rtimes=3 or higher

Performance Notes

  • Default changed: rtimes=1 provides ~3x better performance vs previous rtimes=3
  • This brings Julia k-means performance much closer to R's classInt
  • The Clustering.jl backend is well-optimized and reliable

Examples

# Basic usage (fast, single random start)
data = [1, 5, 10, 15, 20, 25, 30, 35, 40]
breaks = kmeans_breaks(data, 3)

# More stable results (slower, multiple random starts)
breaks = kmeans_breaks(data, 3; rtimes=3)

# Maximum stability (slowest)
breaks = kmeans_breaks(data, 3; rtimes=10)
source
Breakers.quantile_breaksFunction
quantile_breaks(x::Vector{<:Real}, k::Int) -> Vector{Float64}

Calculate breaks using quantiles.

Arguments

  • x: Vector of numeric values
  • k: Number of classes (resulting in k+1 break points)

Returns

  • Vector{Float64}: Vector of break points (including min and max values)

Note

  • For perfect compatibility with R's ClassInt, some edge cases may require manual handling. See test/comparetoclassInt_R.jl for examples.
source
Breakers.equal_breaksFunction
equal_breaks(x::AbstractVector{<:Real}, n::Integer) -> Vector{Float64}

Calculate equal interval breaks for data binning.

Arguments

  • x: Vector of numeric values
  • n: Number of classes (resulting in n+1 break points)

Returns

  • Vector{Float64}: Vector of break points at equal intervals, including min and max values

Details

  • The function divides the range of values into n equal intervals
  • This is equivalent to R's classIntervals() with style="equal"
  • Returns n+1 break points including minimum and maximum values

Examples

v = [1, 5, 10, 20, 50, 100]
equal_breaks(v, 4)
# result == [1.0, 25.75, 50.5, 75.25, 100.0]
source
Breakers.fixed_breaksFunction
fixed_breaks(x::Vector{<:Real}, break_points::Vector{<:Real}) -> Vector{Float64}

Create breaks using user-specified break points.

Arguments

  • x: Vector of numeric values (used for validation and to add min/max if needed)
  • break_points: Vector of break point values to use

Returns

  • Vector{Float64}: Vector of break points including min and max values

Details

  • This method allows users to specify exact break points rather than letting an algorithm choose them
  • Break points are automatically sorted
  • Minimum and maximum values from the data are added if not already present
  • This integrates with the standard workflow (getbins, getbinindices, cutdata)

Examples

# Specify custom break points
data = [1, 5, 10, 15, 20, 25, 30]
breaks = fixed_breaks(data, [10, 20])  # Returns [1.0, 10.0, 20.0, 30.0]

# Use with standard workflow
bin_indices = get_bin_indices_fixed(data, [10, 20])
bin_labels = cut_data(data, fixed_breaks(data, [10, 20]))

See also

  • get_breaks_raw: For accessing all break methods including fixed
  • cut_data: For applying breaks to create labeled bins
source
Breakers.split_at_indicesFunction
split_at_indices(v::Vector, indices::Vector{Int}) -> Vector{Vector}

Split a vector into multiple sub-vectors at specified indices (legacy function).

Arguments

  • v::Vector: The input vector to be split
  • indices::Vector{Int}: Indices where the vector should be split

Returns

  • Vector{Vector}: A vector of sub-vectors created by splitting at the specified indices

Note

This is a legacy function. For modern workflow integration, use fixed_breaks with actual values.

source