Abstract
Green mine construction is a primary task in mining development, requiring solutions for safe production, energy conservation, and emission reduction. In the era of big data, mining enterprises face challenges in fully utilizing stored data. This paper proposes an improved parallel predictive control strategy for high-order process control. First, the NFP-growth (New FP-growth) controller is optimized using the Spark distributed computing framework. Second, a computational model based on conditional FP-tree is introduced to address load imbalance across groups. Finally, the algorithm is applied to optimize the operational state of the ball mill. Experimental results validate the feasibility of the algorithm and its superior performance over other control methods, effectively optimizing ball mill parameters and enhancing intelligent control systems.
Keywords: Data mining, Spark, Predictive control, Ball mill
1. Introduction
Grinding is an indispensable step in mineral processing, accounting for approximately 50% of operational costs in concentrators. The ball mill is critical for improving grinding quality and achieving energy efficiency. Optimizing its operational state holds significant importance for mining enterprises.
Previous studies, such as Zhao Dayong et al. [3], proposed a two-layer optimization control method for multi-input multi-output grinding processes. Dai Chuan et al. [4] integrated case-based reasoning with theoretical learning to enhance ball mill adaptability under varying conditions. However, existing strategies fail to address massive datasets or external constraints like ore composition, particle size distribution, and hardness.
This paper leverages data mining techniques and the Spark framework to overcome these limitations. We propose the BLPNFP-growth (Balanced Load Parallel NFP-growth) algorithm, which improves computational efficiency and load balancing. The algorithm is tested on ball mill operational data, demonstrating enhanced performance.
2. Research and Improvement of the Algorithm
2.1 Description of NFP-growth Algorithm
The NFP-growth algorithm improves upon FP-growth by reducing memory overhead and traversal time. Key steps include:
- Input: Transaction database (Table 1).
- Step 1: Scan data to build a temporary T-tree and filter infrequent items.
- Step 2: Construct an NFP-tree with a node table.
- Step 3: Merge nodes and generate frequent itemsets.
Table 1: Sample Transaction Database
Tid | Itemset |
---|---|
101 | 12, 11, 15 |
102 | 12, 14 |
103 | 12, 13 |
… | … |
2.2 Parallelization of NFP-growth Algorithm Based on Spark
The PNFP-growth algorithm parallelizes NFP-growth into four stages:
- Stage 1: Convert raw data into RDDs, filter infrequent items, and partition transactions.
- Stage 2: Decompose transactions into suffix-pattern paths.
- Stage 3: Build local FP-trees using mapPartitions.
- Stage 4: Aggregate results and save to HDFS.
2.3 Load Balancing Strategy Optimization
Traditional grouping strategies cause uneven workload distribution. The proposed BLPNFP-growth algorithm estimates computational load using conditional FP-tree dimensions:
- Tree Depth: Path length from root to node.
- Tree Width: Number of suffix-pattern paths.
Formulas for load estimation:item_loc=L(item,Flist)item_loc=L(item,Flist)Calculation=log(item_loc)Calculation=log(item_loc)Tree_Size=item_sup×(item_loc+1)Tree_Size=item_sup×(item_loc+1)
Higher support (item_sup) increases computational load, ensuring balanced task allocation.
3. Performance Analysis of the Algorithm
3.1 Experimental Environment
Tests were conducted on a Spark cluster with the configuration in Table 2.
Table 2: Cluster Node Configuration
Component | Specification |
---|---|
CPU | Intel i5-6200U 2.30GHz |
RAM | 8GB |
Spark Version | 2.1.3 |
Hadoop Version | 3.2.0 |
3.2 Case Study Analysis
The Webdocs dataset (Table 3) was used to evaluate performance under varying data scales, support thresholds, and node counts.
Table 3: Webdocs Dataset Characteristics
Dataset | Size (GB) | Records | Attributes | Max Transaction Length |
---|---|---|---|---|
Webdocs.dat | 1.486 | 1,692,082 | 5,267,656 | 71,472 |
4. Ball Mill Performance Optimization Experiment
4.1 Data Sources
Historical data from a concentrator ball mill (75,000 records) was used. Key parameters include feed rate, water flow, and discharge fineness (Table 4).
Table 4: Ball Mill Operational Parameters
Parameter | Controllable | Unit |
---|---|---|
Feed Rate | Yes | t/h |
Water Flow | Yes | m³/h |
Mill Current | Yes | A |
Discharge Fineness | Yes | % |
4.2 Determination of Optimization Parameters
Pearson correlation analysis (Formula 5) identified critical parameters influencing discharge fineness:ρ=∑(xi−xˉ)(yi−yˉ)∑(xi−xˉ)2∑(yi−yˉ)2ρ=∑(xi−xˉ)2∑(yi−yˉ)2∑(xi−xˉ)(yi−yˉ)
Table 5: Correlation Between Parameters and Discharge Fineness
Parameter | ∣ρ∣∣ρ∣ | Rank |
Feed Rate | 0.8261 | 1 |
Water Flow | 0.8134 | 2 |
Grinding Aid Flow | 0.8062 | 3 |
4.3 Mining Results
Discretized parameters (Table 6) were analyzed using BLPNFP-growth to extract association rules (Table 7).
Table 6: Discretization of Parameters
Parameter | Intervals |
---|---|
Feed Rate (t/h) | [84.08, 85.34], [85.35, 85.87], … |
Water Flow (m³/h) | [73.68, 77.16], [77.17, 79.03], … |
Table 7: Key Association Rules
Parameter | Target Interval |
---|---|
Mill Current (A) | [71.34, 72.74], [72.74, 74.11], … |
Feed Rate (t/h) | [84.21, 85.36], [85.36, 86.61], … |
4.4 Optimization Results
Comparative analysis (Table 8) shows improved discharge fineness and operational efficiency.
Table 8: Actual vs. Optimized Parameter Values
Parameter | Case 1 | Case 2 | Case 3 | Case 4 | Case 5 |
---|---|---|---|---|---|
Feed Rate (t/h) | 90.35→91.71 | 91.79→93.08 | 92.18→94.51 | … | … |
Discharge Fineness (%) | 26.89→28.23 | 27.01→28.23 | 27.60→29.54 | … | … |
5. Conclusion
This study proposes the BLPNFP-growth algorithm, which enhances load balancing and computational efficiency in Spark. Applied to ball mill optimization, it successfully identifies optimal operational parameters, improving production metrics like discharge fineness. Future work will integrate real-time data streams for dynamic control.