Taking the human out of the loop: A review of Bayesian optimization B Shahriari, K Swersky, Z Wang, RP Adams, N De Freitas Proceedings of the IEEE 104 (1), 148-175, 2015 | 985 | 2015 |

Prototypical networks for few-shot learning J Snell, K Swersky, R Zemel Advances in neural information processing systems, 4077-4087, 2017 | 802 | 2017 |

Learning fair representations R Zemel, Y Wu, K Swersky, T Pitassi, C Dwork International Conference on Machine Learning, 325-333, 2013 | 470 | 2013 |

Generative moment matching networks Y Li, K Swersky, R Zemel International Conference on Machine Learning, 1718-1727, 2015 | 392 | 2015 |

Scalable bayesian optimization using deep neural networks J Snoek, O Rippel, K Swersky, R Kiros, N Satish, N Sundaram, M Patwary, ... International conference on machine learning, 2171-2180, 2015 | 369 | 2015 |

Multi-task bayesian optimization K Swersky, J Snoek, RP Adams Advances in neural information processing systems, 2004-2012, 2013 | 306 | 2013 |

Predicting deep zero-shot convolutional neural networks using textual descriptions J Lei Ba, K Swersky, S Fidler Proceedings of the IEEE International Conference on Computer Vision, 4247-4255, 2015 | 239 | 2015 |

Neural networks for machine learning G Hinton, N Srivastava, K Swersky Coursera, video lectures 264, 1, 2012 | 218 | 2012 |

Neural networks for machine learning lecture 6a overview of mini-batch gradient descent G Hinton, N Srivastava, K Swersky Cited on 14 (8), 2012 | 208* | 2012 |

The variational fair autoencoder C Louizos, K Swersky, Y Li, M Welling, R Zemel arXiv preprint arXiv:1511.00830, 2015 | 173 | 2015 |

Inductive principles for restricted Boltzmann machine learning B Marlin, K Swersky, B Chen, N Freitas Proceedings of the Thirteenth International Conference on Artificial …, 2010 | 148 | 2010 |

Meta-learning for semi-supervised few-shot classification M Ren, E Triantafillou, S Ravi, J Snell, K Swersky, JB Tenenbaum, ... arXiv preprint arXiv:1803.00676, 2018 | 132 | 2018 |

Input warping for Bayesian optimization of non-stationary functions J Snoek, K Swersky, R Zemel, R Adams International Conference on Machine Learning, 1674-1682, 2014 | 130 | 2014 |

Lecture 6a overview of mini–batch gradient descent G Hinton, N Srivastava, K Swersky Coursera Lecture slides https://class. coursera. org/neuralnets-2012-001 …, 2012 | 117 | 2012 |

Freeze-thaw Bayesian optimization K Swersky, J Snoek, RP Adams arXiv preprint arXiv:1406.3896, 2014 | 114 | 2014 |

On autoencoders and score matching for energy based models K Swersky, MA Ranzato, D Buchman, ND Freitas, BM Marlin Proceedings of the 28th International Conference on Machine Learning (ICML …, 2011 | 79 | 2011 |

A tutorial on stochastic approximation algorithms for training restricted Boltzmann machines and deep belief nets K Swersky, B Chen, B Marlin, N De Freitas 2010 Information Theory and Applications Workshop (ITA), 1-10, 2010 | 71 | 2010 |

Fast exact inference for recursive cardinality models D Tarlow, K Swersky, RS Zemel, RP Adams, BJ Frey arXiv preprint arXiv:1210.4899, 2012 | 60 | 2012 |

Raiders of the lost architecture: Kernels for Bayesian optimization in conditional parameter spaces K Swersky, D Duvenaud, J Snoek, F Hutter, MA Osborne arXiv preprint arXiv:1409.4011, 2014 | 39 | 2014 |

Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning D Tarlow, K Swersky, L Charlin, I Sutskever, RS Zemel | 36 | 2013 |