《Random Search for Hyper-Parameter Optimization》
wos全部缩略词网址:Web of Science Core Collection Helpimages.webofknowledge.comWeb of Science Core Collection HelpWeb of Science Core Collection Helpimages.webofknowledge.com
FN Clarivate Analytics Web of Science
VR 1.0
PT J
AU Bergstra, J AU:作者
Bengio, Y
AF Bergstra, James AF:作者全名
Bengio, Yoshua
TI Random Search for Hyper-Parameter Optimization TI:文章标题
SO JOURNAL OF MACHINE LEARNING RESEARCH SO:期刊名
SN 1532-4435 SN:ISSN
PD FEB PD: 出版日期(Publication Date)
PY 2012 出版年(Year Published)
VL 13 VL:卷号(Volume)
BP 281 BP: 起始页码(Beginning Page)
EP 305 终止页码(Ending Page)
UT WOS:000303046000003
ER
EF
还有一些,在这篇文章的页面没有提到:
IS:期号(Issue) OI:ORCID Identifier (Open Researcher and Contributor ID)
EI : 电子期刊的ISSN(eISSN) UT: 入藏号(Accession Number)
Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success-they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.