Jon Stearley
Title
Cited by
Cited by
Year
What supercomputers say: A study of five system logs
A Oliner, J Stearley
37th Annual IEEE/IFIP International Conference on Dependable Systems and …, 2007
3512007
Addressing failures in exascale computing
M Snir, RW Wisniewski, JA Abraham, SV Adve, S Bagchi, P Balaji, J Belak, ...
The International Journal of High Performance Computing Applications 28 (2 …, 2014
3222014
Evaluating the viability of process replication reliability for exascale systems
K Ferreira, J Stearley, JH Laros III, R Oldfield, K Pedretti, R Brightwell, ...
Proceedings of 2011 International Conference for High Performance Computing …, 2011
2612011
Memory errors in modern systems: The good, the bad, and the ugly
V Sridharan, N DeBardeleben, S Blanchard, KB Ferreira, J Stearley, ...
ACM SIGARCH Computer Architecture News 43 (1), 297-310, 2015
2112015
Feng shui of supercomputer memory positional effects in DRAM and SRAM faults
V Sridharan, J Stearley, N DeBardeleben, S Blanchard, S Gurumurthi
SC'13: Proceedings of the International Conference on High Performance …, 2013
1642013
Towards informatic analysis of syslogs
J Stearley
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No …, 2004
1602004
Alert detection in system logs
AJ Oliner, A Aiken, J Stearley
2008 Eighth IEEE International Conference on Data Mining, 959-964, 2008
1072008
Bad words: Finding faults in spirit's syslogs
J Stearley, AJ Oliner
2008 Eighth IEEE International Symposium on Cluster Computing and the Grid …, 2008
732008
Defining and measuring supercomputer Reliability, Availability, and Serviceability (RAS)
J Stearley
Proceedings of the Linux clusters institute conference, 2005
352005
Inter-agency workshop on hpc resilience at extreme scale
J Daly, B Harrod, T Hoang, L Nowell, B Adolf, S Borkar, N DeBardeleben, ...
National Security Agency Advanced Computing Systems, 2012
332012
Increasing fault resiliency in a message-passing environment
K Ferreira, R Riesen, R Oldfield, J Stearley, J Laros, K Pedretti, ...
Sandia National Laboratories, Technical report SAND2009-6753, 2009
332009
Bridging the Gaps: Joining Information Sources with Splunk.
J Stearley, S Corwell, K Lord
SLAML, 2010
322010
Redundant computing for exascale systems
R Riesen, K Ferreira, J Stearley, R Oldfield, JH Laros III, K Pedretti, ...
Sandia National Laboratories, 2010
302010
Does partial replication pay off?
J Stearley, K Ferreira, D Robinson, J Laros, K Pedretti, D Arnold, ...
IEEE/IFIP International Conference on Dependable Systems and Networks …, 2012
262012
See applications run and throughput jump: The case for redundant computing in HPC
R Riesen, K Ferreira, J Stearley
2010 International Conference on Dependable Systems and Networks Workshops …, 2010
252010
JHL III, R
K Ferreira, R Riesen, P Bridges, D Arnold, J Stearley
Oldfield, K. Pedretti, and R. Brightwell,“Evaluating the viability of …, 2011
222011
rMPI: increasing fault resiliency in a message-passing environment
K Ferreira, R Riesen, R Oldfield, J Stearley, J Laros, K Pedretti, ...
Sandia National Laboratories, Albuquerque, NM, Tech. Rep. SAND2011-2488, 2011
192011
Extra bits on SRAM and DRAM errors–more data from the field
N DeBardeleben, S Blanchard, V Sridharan, S Gurumurthi, J Stearley, ...
IEEE Workshop on Silicon Errors in Logic-System Effects (SELSE), 2014
142014
Sisyphus log data mining toolkit
J Stearley
Accessed from the Web, 2009
112009
A state-machine approach to disambiguating supercomputer event logs
J Stearley, R Ballance, L Bauman
82012
The system can't perform the operation now. Try again later.
Articles 1–20