CatBoost for Apache Spark AUC eval metric not working as expected. #2654

VincentHanxiaoDu · 2024-05-03T14:09:36Z

Problem: Eval metric for catboost_spark.CatBoostClassifier is not working when it's set to be "AUC".
catboost version: 1.2.5
Operating System: CentOS Linux release 7.9.2009
CPU: Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
GPU: Not installed.

My code is like:

session = (
        SparkSession.builder
        .appName("test catboost")
        .master("yarn")
        .config("spark.jars.packages", "ai.catboost:catboost-spark_3.5_2.12:1.2.5")
        .enableHiveSupport()
        .getOrCreate()
)

import catboost_spark
......

clf = (
        catboost_spark.CatBoostClassifier()
        .setLabelCol("meta_fpd_15")
        .setFeaturesCol("features")
        .setDepth(6)
        .setRandomSeed(42)
        .setEvalMetric("AUC")
        .setLearningRate(0.3)
        .setIterations(500)
)

model = clf.fit(train_pool, evalDatasets=[eval_pool])

The training log is like this:


0:	test: 0.5061911	best: 0.5061911 (0)	total: 1.14s	remaining: 9m 30s
1:	test: 0.5052205	best: 0.5061911 (0)	total: 1.89s	remaining: 7m 50s
2:	test: 0.5022173	best: 0.5061911 (0)	total: 2.65s	remaining: 7m 19s
3:	test: 0.5015299	best: 0.5061911 (0)	total: 3.43s	remaining: 7m 5s
4:	test: 0.5024059	best: 0.5061911 (0)	total: 4.24s	remaining: 6m 59s
5:	test: 0.5021867	best: 0.5061911 (0)	total: 4.92s	remaining: 6m 45s
6:	test: 0.5020990	best: 0.5061911 (0)	total: 5.63s	remaining: 6m 36s
7:	test: 0.5017771	best: 0.5061911 (0)	total: 6.22s	remaining: 6m 22s
8:	test: 0.5021608	best: 0.5061911 (0)	total: 6.81s	remaining: 6m 11s
9:	test: 0.5020003	best: 0.5061911 (0)	total: 7.4s	remaining: 6m 2s
10:	test: 0.5021596	best: 0.5061911 (0)	total: 7.95s	remaining: 5m 53s
11:	test: 0.5025097	best: 0.5061911 (0)	total: 8.55s	remaining: 5m 47s
12:	test: 0.5024379	best: 0.5061911 (0)	total: 9.14s	remaining: 5m 42s
13:	test: 0.5024908	best: 0.5061911 (0)	total: 9.8s	remaining: 5m 40s
14:	test: 0.5026709	best: 0.5061911 (0)	total: 10.8s	remaining: 5m 50s
15:	test: 0.5026764	best: 0.5061911 (0)	total: 11.6s	remaining: 5m 49s

which indicates that the model is pretty much randomly predicting the result.

After removing .setEvalMetric("AUC"), the trace is:


0:	learn: 0.4242637	test: 0.4246950	best: 0.4246950 (0)	total: 3.64s	remaining: 30m 16s
1:	learn: 0.3237355	test: 0.3241643	best: 0.3241643 (1)	total: 4.21s	remaining: 17m 27s
2:	learn: 0.2840065	test: 0.2846486	best: 0.2846486 (2)	total: 4.76s	remaining: 13m 9s
3:	learn: 0.2659343	test: 0.2665972	best: 0.2665972 (3)	total: 5.34s	remaining: 11m 1s
4:	learn: 0.2534263	test: 0.2538457	best: 0.2538457 (4)	total: 5.88s	remaining: 9m 42s
5:	learn: 0.2473411	test: 0.2478402	best: 0.2478402 (5)	total: 6.45s	remaining: 8m 51s
6:	learn: 0.2438247	test: 0.2444119	best: 0.2444119 (6)	total: 7.03s	remaining: 8m 14s
7:	learn: 0.2399831	test: 0.2407306	best: 0.2407306 (7)	total: 7.56s	remaining: 7m 45s
8:	learn: 0.2351502	test: 0.2360657	best: 0.2360657 (8)	total: 8.13s	remaining: 7m 23s
9:	learn: 0.2332105	test: 0.2341404	best: 0.2341404 (9)	total: 8.68s	remaining: 7m 5s
10:	learn: 0.2318636	test: 0.2327980	best: 0.2327980 (10)	total: 9.2s	remaining: 6m 49s
11:	learn: 0.2299601	test: 0.2309502	best: 0.2309502 (11)	total: 9.77s	remaining: 6m 37s
12:	learn: 0.2286594	test: 0.2297084	best: 0.2297084 (12)	total: 10.3s	remaining: 6m 26s
13:	learn: 0.2279188	test: 0.2289262	best: 0.2289262 (13)	total: 10.9s	remaining: 6m 17s
14:	learn: 0.2266488	test: 0.2277037	best: 0.2277037 (14)	total: 11.4s	remaining: 6m 9s
15:	learn: 0.2258320	test: 0.2269671	best: 0.2269671 (15)	total: 12s	remaining: 6m 1s

The text was updated successfully, but these errors were encountered:

VincentHanxiaoDu · 2024-05-06T05:43:17Z

Some additional infomation from catboost_training.json:

{
"meta":{"test_sets":["test"],"test_metrics":[{"best_value":"Max","name":"AUC"},{"best_value":"Min","name":"Logloss"}],"learn_metrics":[{"best_value":"Min","name":"Logloss"}],"launch_mode":"Train","parameters":"","iteration_count":500,"learn_sets":["learn"],"name":"experiment"},
"iterations":[
{"learn":[0.4197797692],"iteration":0,"passed_time":1.142635039,"remaining_time":570.1748844,"test":[0.5061910795,0.4200351713]},
{"learn":[0.32944674],"iteration":1,"passed_time":1.890934491,"remaining_time":470.8426882,"test":[0.5052204608,0.3299703646]},
{"learn":[0.2809860157],"iteration":2,"passed_time":2.65054605,"remaining_time":439.107129,"test":[0.50221727,0.2815328292]},
{"learn":[0.2636805601],"iteration":3,"passed_time":3.429247404,"remaining_time":425.2266781,"test":[0.5015298599,0.2641027614]},
{"learn":[0.2526056353],"iteration":4,"passed_time":4.240691343,"remaining_time":419.8284429,"test":[0.5024059378,0.2531745226]},
{"learn":[0.2483815893],"iteration":5,"passed_time":4.923376875,"remaining_time":405.3580293,"test":[0.5021867269,0.2489771156]},
{"learn":[0.2448510659],"iteration":6,"passed_time":5.629771195,"remaining_time":396.4967427,"test":[0.5020990196,0.2454606482]},
{"learn":[0.2415365281],"iteration":7,"passed_time":6.216703468,"remaining_time":382.3272633,"test":[0.5017771165,0.2421939925]},
{"learn":[0.2381845391],"iteration":8,"passed_time":6.812368914,"remaining_time":371.6525707,"test":[0.5021607929,0.2388608433]},
{"learn":[0.2348912071],"iteration":9,"passed_time":7.396137807,"remaining_time":362.4107526,"test":[0.5020003296,0.2355978716]},
{"learn":[0.2334697888],"iteration":10,"passed_time":7.945340763,"remaining_time":353.2065121,"test":[0.50215957,0.2342075648]},
{"learn":[0.2315752112],"iteration":11,"passed_time":8.552111373,"remaining_time":347.7858625,"test":[0.5025096643,0.2323032592]},
{"learn":[0.2299688643],"iteration":12,"passed_time":9.142937959,"remaining_time":342.508522,"test":[0.5024379483,0.2306813825]},
...

andrey-khropov · 2024-05-07T09:54:05Z

Can you train on the same dataset (or create another dataset to reproduce) using local training (just local python (or R) package without Spark) and check that the result is the same?

Maybe the nature of your dataset is that CatBoost is unable to train a good model on it.

Do other GBDT packages like LightGBM or XGBoost produce significantly better results?

VincentHanxiaoDu · 2024-05-08T06:52:15Z

If I use the local version, the AUC on eval set is about 0.74, so I think it's not about the dataset, but I'll evaluate the fitted model on eval set just to make sure if the result is not the same as reported in the log.

andrey-khropov added Spark quality labels May 7, 2024

andrey-khropov added the need info label May 7, 2024

VincentHanxiaoDu closed this as completed May 8, 2024

VincentHanxiaoDu reopened this May 8, 2024

andrey-khropov removed the need info label May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CatBoost for Apache Spark AUC eval metric not working as expected. #2654

CatBoost for Apache Spark AUC eval metric not working as expected. #2654

VincentHanxiaoDu commented May 3, 2024

VincentHanxiaoDu commented May 6, 2024

andrey-khropov commented May 7, 2024

VincentHanxiaoDu commented May 8, 2024

CatBoost for Apache Spark AUC eval metric not working as expected. #2654

CatBoost for Apache Spark AUC eval metric not working as expected. #2654

Comments

VincentHanxiaoDu commented May 3, 2024

VincentHanxiaoDu commented May 6, 2024

andrey-khropov commented May 7, 2024

VincentHanxiaoDu commented May 8, 2024