Green mining drives modern mineral processing, where production safety and energy efficiency are critical. Ball mills, consuming ~50% of concentrator operational costs, are pivotal for grinding quality and energy conservation. However, traditional optimization methods struggle with massive mining datasets. This work proposes BLPNFP-growth—a parallel frequent itemset mining (FIM) algorithm on Spark—to optimize ball mill parameters using historical operational data. Our approach resolves load imbalance during distributed mining and extracts actionable rules for real-time control, achieving significant improvements in product fineness and throughput.
1. Introduction
Grinding determines final mineral liberation and separation efficiency. As core grinding equipment, ball mills critically influence production quality and energy consumption. Current optimization strategies face limitations: 1) Poor scalability for big data; 2) Neglect of ore characteristics (hardness, size distribution); 3) Insufficient adaptation to dynamic control loop interactions. With mining data accumulating exponentially, distributed data mining becomes essential. We parallelize the NFP-growth algorithm on Spark to address these gaps, enabling efficient extraction of ball mill optimization rules from terabytes of operational records.

2. Related Work
Existing ball mill optimization approaches include:
Method | Contribution | Limitation |
---|---|---|
Two-layer control [6] | Handles MIMO grinding processes | Ignores ore property variations |
CBR-Reinforcement Learning [7] | Adapts to diverse operating conditions | Computationally expensive for big data |
PSO-CBR optimization [8] | Improves product fineness control | Fails to model parameter interactions |
Fuzzy expert control [9] | Supervises semi-autogenous grinding | Requires extensive domain knowledge |
Distributed FIM algorithms like PFP-growth on Spark show promise but suffer from:
- Memory overhead from redundant header tables
- Load imbalance during parallel conditional tree construction
Our BLPNFP-growth algorithm overcomes these by introducing a compact tree structure and dynamic workload partitioning.
3. The Improved NFP-Growth Algorithm
3.1 NFP-Growth Fundamentals
NFP-growth improves FP-growth by eliminating header tables and scanning databases once. For transaction database TD:
Tid | Itemset |
---|---|
101 | I2,I1,I5 |
102 | I2,I4 |
Procedure:
- Scan TD to build T-tree, remove infrequent items, generate frequent-1 itemset L
- Construct NFP-tree with root null and node table Node_T
- Merge identical nodes in Node_T
- Generate frequent itemsets per item
3.2 Parallelization on Spark (PNFP-Growth)
Four-stage parallel workflow:
- Data preprocessing:
$$ \text{RDD} \xrightarrow{\text{flatMap}} \text{List} \xrightarrow{\text{filter}} F\text{-list} $$
Partition transactions into P groups via Hash Partition - Pattern decomposition: Split transactions into path sequences
- Parallel mining: Build local FP-trees using mapPartitions
- Result aggregation: Merge outputs to HDFS
3.3 Load Balancing Optimization (BLPNFP-Growth)
Original PNFP-growth partitions items equally, causing computational imbalance. We propose a conditional FP-tree size model:
Workload estimation:
Item position in F-list:
$$ \text{item_loc} = L(\text{item}, F\text{-list}) $$
Computational weight:
$$ \text{Calculation} = \log(\text{item_loc}) $$
Tree scale:
$$ \text{Tree_Size} = \text{item_sup} \times (\text{item_loc} + 1) $$
Higher support (item_sup) yields larger Calculation and Tree_Size. Partitions are balanced using these metrics.
4. Algorithm Performance Analysis
Cluster configuration:
Component | Specification |
---|---|
CPU | Intel i5-6200U 2.30GHz |
RAM | 8GB |
Spark | v2.1.3 |
Dataset | Webdocs (1.48GB, 1.69M records) |
4.1 Scalability Tests
Execution time (min_sup=0.6):
Data scale (×10⁴) | PFP-growth (s) | PNFP-growth (s) | BLPNFP-growth (s) |
---|---|---|---|
40 | 2,850 | 2,150 | 1,780 |
80 | 3,120 | 2,410 | 1,920 |
120 | 3,450 | 2,680 | 2,050 |
160 | 3,820 | 2,950 | 2,210 |
4.2 Support Threshold Impact
Runtime on full Webdocs dataset:
min_sup | PFP-growth (s) | PNFP-growth (s) | BLPNFP-growth (s) |
---|---|---|---|
0.2 | 3,480 | 2,720 | 2,180 |
0.4 | 2,860 | 2,250 | 1,760 |
0.6 | 2,210 | 1,780 | 1,410 |
0.8 | 1,320 | 1,050 | 830 |
4.3 Node Scaling Efficiency
Runtime with varying cluster size (min_sup=0.6):
Nodes | PFP-growth (s) | PNFP-growth (s) | BLPNFP-growth (s) |
---|---|---|---|
1 | 11,200 | 8,950 | 7,210 |
2 | 6,810 | 5,420 | 4,180 |
3 | 4,930 | 3,910 | 2,950 |
4 | 3,740 | 2,980 | 2,230 |
BLPNFP-growth shows 20–25% speedup over PFP-growth and 12–18% over PNFP-growth across tests.
5. Ball Mill Optimization Experiments
5.1 Data Preparation
75,000 records sampled every 5 minutes from an operational ball mill:
Parameter | Controllable | Unit |
---|---|---|
Ore feed rate | Yes | t/h |
Water feed rate | Yes | m³/h |
Motor current | Yes | A |
Discharge fineness | Yes | % |
5.2 Key Parameter Selection
Discharge fineness is the optimization target. Pearson correlation identifies critical controllable parameters:
$$ \rho = \frac{\sum_{i=1}^{n} (x_i – \bar{x})(y_i – \bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i – \bar{x})^2 \sum_{i=1}^{n} (y_i – \bar{y})^2}} $$
Parameter | |ρ| vs. Fineness | Rank |
---|---|---|
Ore feed rate | 0.8261 | 1 |
Water feed rate | 0.8134 | 2 |
Grinding aid rate | 0.8062 | 3 |
Motor current | 0.8115 | 4 |
5.3 Operational Regime Partitioning
Stable ball mill operation is defined by motor current ∈ [72A, 85A]. Ore hardness and size distribution are considered constant within regimes. Data is discretized into clusters:
Parameter | Discretization Intervals |
---|---|
Motor current (A) | [72.45,73.75], [73.76,76.29], …, [84.57,86.81] |
Ore feed rate (t/h) | [84.08,85.34], [85.35,85.87], …, [93.93,95.65] |
5.4 Optimization Results
BLPNFP-growth extracts strong association rules from discretized data. Optimization targets vs. actual values:
Parameter | Regime 1 (78.76A) | Regime 5 (84.16A) | ||
---|---|---|---|---|
Actual | Target | Actual | Target | |
Ore feed rate (t/h) | 90.35 | 91.71 | 95.82 | 97.17 |
Water feed rate (m³/h) | 82.93 | 81.46 | 87.91 | 85.95 |
Discharge fineness (%) | 26.89 | 28.23 | 30.87 | 32.04 |
Key improvements: 1) Fineness increased by 4.2–5.1%; 2) Throughput elevated by 1.5–1.8 t/h; 3) Water/grinding aid consumption reduced by 1.8–2.1%.
6. Conclusion
BLPNFP-growth enables efficient ball mill optimization using Spark-based distributed mining. Our contributions: 1) Parallel NFP-growth implementation; 2) Conditional FP-tree workload balancing; 3) Operational parameter optimization framework. Experimental results confirm 18–25% faster execution versus benchmarks and significant production improvements. Future work will integrate real-time streaming for adaptive ball mill control under dynamic ore conditions.