ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

Subjects

Authors

Institution

result total 8.

Hide Summary

Hits

Date

Downloads

Your conditions: 何沧平

1. ChinaXiv:202212.00067
Download

Mathematical mechanism of regularization of cross entropy

Subjects: Mathematics >> Mathematics （General） submitted time 2022-12-04

何沧平

Abstract： In this paper, I found the two reasons of overfitting of cross entropy: boundary samples occupy a larger
and larger share as the length of normal vector becomes longer and longer, boundary samples do not
fit their probability density function well.

Peer Review Status:Awaiting Review

Hits 3827 Downloads 419 Comment
2. ChinaXiv:202210.00011
Download

A fast algorithm for searching almost same images

Subjects: Mathematics >> Mathematics （General） submitted time 2022-10-10

He Cangping Xu Tao

Abstract： In weibo APP, there are many almost same images whose only difference are watermark and resolution. In order to find out the most similar image efficiently, this paper proposes a algorithm named multi-level fingerprint, which contains 5 character strings and 3 vectors. On a dataset of 1 million images from WEIBO APP, multi-level fingerprint achieves a precision 97.69% and QPS 345.

Peer Review Status:Awaiting Review

Hits 5406 Downloads 495 Comment
3. ChinaXiv:202010.00007
Download

Name2vec: Name Embedding for Recommender System

Subjects: Mathematics >> Computational Mathematics. submitted time 2020-10-19

何沧平许涛

Abstract： Before entering into a recommender system, an entity name must be embedded into a vector. Some popular models, such as word2vec, are based on the principle “words which are in the same syntactic position should embedded into similar vectors”. However, sequence of entity names has no syntactic structure, which led to the low quality of name vectors. Based on the principle “neighbouring names should embedded into similar vectors”, this paper proposes a novel algorithm named name2vec. Name2vec has new features: vector length equals 1, relative weight which has solved the low frequency problem, optimization objective function is mean square error rather than cross entropy. The quality of embedding is measured by the similarity of entity names. On there datasets from WEIBO.COM, name2vec has a better performance than word2veec.

Peer Review Status:Awaiting Review

Hits 17165 Downloads 1476 Comment
4. ChinaXiv:201911.00099
Download

Sliding Means Clustering

Subjects: Mathematics >> Computational Mathematics. submitted time 2019-11-26

何沧平孟令霞

Abstract： This paper proposes a novel clustring algorithm named Sliding Means, aiming to take the place of k-means algorithm which is widely used in internet applications. Sliding means has the ability to handle with very large datasets, and to automatically determine the number of clusters. With the help of shuffling samples, bad initial centroids have little chance to be selected. Sliding means is also able to drop some bad centroids on the fly. On the iris dataset and optdigits dataset, sliding means achieves better performance(Adjusted Rand Index) than k-means by 9.93% and 5.17% respectively.

Peer Review Status:Awaiting Review

Hits 22965 Downloads 2181 Comment
5. ChinaXiv:201904.00081
Download

Umbrella Regression

Subjects: Mathematics >> Computational Mathematics. submitted time 2019-04-10

何沧平

Abstract： This paper proposes a novel method named Polyhedron Regression(PR) for Click-Through-Rate prediction, aiming to take the place of Factorization Machines(FM). PR constructs a convex polyhedra with hyperplanes to separate positive samples from negative samples. PR has intuitionistic geometrical interpretations and a Lipschitz continuous surface, converges to global optimum point from arbitrary initial values. Compared with FM, PR has better classification accuracy, interpretability and surface smoothness on the three artificial datasets. With comparable parameters and computation, PR achieves better AUC than FM on Avazu and Criteo datasets.

Peer Review Status:Awaiting Review

Hits 26917 Downloads 2177 Comment
6. ChinaXiv:201803.00428
Download

The Counter Pull Acceleration Method for Logistic Regression

Subjects: Mathematics >> Computational Mathematics. submitted time 2018-04-03

何沧平

Abstract： In this paper, I found the two reasons of overfitting of logistic regression: boundary samples occupy a larger and larger share as the length of normal vector becomes longer and longer, boundary samples do not fit their probability density function well. With the help of insight in overfitting, I propose a acceleration method for logistic regression and got a training speedup of 38.25 on MNIST dataset, a training speedup of 5.61 on CIFAR10 dataset.

Peer Review Status:Awaiting Review

Hits 20322 Downloads 388 Comment
7. ChinaXiv:201803.00428
Download

The Counter Pull Acceleration Method for Logistic Regression

Subjects: Mathematics >> Computational Mathematics. submitted time 2018-03-22

何沧平

Abstract： In this paper, I found the two reasons of overfitting of logistic regression: boundary samples occupy a larger and larger share as the length of normal vector becomes longer and longer, boundary samples do not fit their probability density function well. With the help of insight in overfitting, I propose a acceleration method for logistic regression and got a training speedup of 38.25 on MNIST dataset, a training speedup of 5.61 on CIFAR10 dataset.

Peer Review Status:Awaiting Review

Hits 23546 Downloads 2157 Comment
8. ChinaXiv:201711.02399
Download

The Focusing Classification Method

Subjects: Mathematics >> Theoretical Computer Science submitted time 2017-11-17

何沧平

Abstract： This paper proposes a new linear classification method named Focusing Classification, with the goal of taking the place of Logistic Regression. Focusing Classification has some advantages: length of its normal vector is limited, intuitional geometrical explanation, parameters' initial values are close to the best values. numerical experiments on the MNIST dataset demonstrate that Focusing Classification has better performance than Logistic Regression on length of its normal vector, accuracy and rate of convergence. With initial parameter values, Focusing Classification gains an accuracy of 97.31%.

Peer Review Status:Awaiting Review

Hits 18577 Downloads 2839 Comment