Reading group

Software Performance Optimisation

  • Who: all welcome
  • When: biweekly on Wednesdays at 11 am
  • Mailing list: subscribe here (no commitment necessary)
  • Where: fish bowl William Penney lab (when you enter the building upstairs)
  • Leader: elected to take the leadership of a particular paper (invited to prepare few slides but it is not mandatory)
  • Propose: if you would like to suggest a paper to read, please let us know!
  • Questions: contact us




The focus is software performance optimisation with special emphasis on:

  • Parallel programming models and languages
  • User-directed, semi-automatic, and automatic parallelisation techniques
  • Optimizations for exploiting memory hierarchy, instruction level parallelism and power consumption
  • High-level specification and domain-specific languages compilation
  • Static and dynamic optimisation techniques for performance and scalability
  • Architectural models and performance prediction
  • Parallel runtime systems
  • Scientific computing applications and large scale simulations
  • Heterogeneous computing and accelerators
  • High-performance embedded systems
  • Custom computing, reconfigurable hardware, co-design, hardware-software co-optimisation


Future meetings

192012PENCIL: Towards a Platform-Neutral Compute Intermediate Language for DSLsR. Baghdadi, A. Cohen, S. Guelton, S. Verdoolaege, J. Inoue, T. Grosser,
G. Kouveli, A. Kravets, A. Lokhmotov, C. Nugteren, F. Waters,
A. F. Donaldson
202015PolyMage: Automatic Optimization
for Image Processing Pipelines
Ravi Teja Mullapudi,
Vinay Vasista,
Uday Bondhugula
212004Code generation in the polyhedral model is easier than you thinkCédric BastoulPACTlink
222011Polyhedron ModelPaul Feautrier, Christian LengauerEncyclopedia of Parallel Computinglink


Previous meetings

1917/02/2016TJ2010Pregel: A System for Large-Scale Graph ProcessingGrzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn,
ACM SIGMOD International Conference on Management of datalink
1803/02/2016Gheorghe-Teodor Bercea2010Spark: cluster computing with working setsZaharia Matei, Chowdhury Mosharaf, Franklin Michael J, Shenker Scott and Stoica IonHotCloudlink
1718/12/2015-2015On Characterizing the Data Access Complexity of ProgramsElango, V., Rastello, F., Pouchet, L. N., Ramanujam, J., Sadayappan, P.POPLlink
1604/12/2015-2008A Practical Automatic Polyhedral Parallelizer and Locality OptimizerUday Bondhugula, Albert Hartono, J. Ramanujam, P. SadayappanPLDIlink
1518/11/2015-2004Improving Data Locality in Static Control Programs (4th chapter) Cédric BastoulPhD thesislink
1428/10/2015-2004Improving Data Locality in Static Control Programs (3rd chapter) - wrap upCédric BastoulPhD thesislink
1314/10/2015Nicolai Stawinoga2004Improving Data Locality in Static Control Programs (3rd chapter) - sections 2 and 3Cédric BastoulPhD thesislink
1230/09/2015Nicolai Stawinoga2004Improving Data Locality in Static Control Programs (3rd chapter) - sections 1 and 2Cédric BastoulPhD thesislink
1116/09/2015Fabio Luporini2004Improving Data Locality in Static Control Programs (2nd chapter)Cédric BastoulPhD thesislink
1008/07/2015Fabio Luporini2014Locality-Aware Mapping of Nested Parallel Patterns on GPUsHyoukJoong Lee, Kevin J. Brown, Arvind K. Sujeeth, Tiark Rompf, Kunle OlukotunMICROlink
903/06/2015Luigi Nardi2015A Prediction-based Method for Deploying Applications on Heterogeneous PlatformsJie Shen, Ana Lucia Varbanescu, Henk Sips, Moisés Viña, Basilio B. Fraguela, Diego AndradeCPC
820/05/2015Nicolai Stawinoga2008Benchmarking GPUs to tune dense linear algebraVolkov, Vasily, and James W. DemmelSClink
713/05/2015Emanuele Vespa2007Dynamic Warp Formation and Scheduling for Efficient GPU Control FlowWilson W. L. Fung Ivan Sham George Yuan Tor M. AamodtMICROlink
606/05/2015Luigi Nardi2011SIMD, SIMT and SMT flexibility and performanceVariousWeb, otherlink
510/03/2015Nicolai Stawinoga2015A large-scale cross-architecture evaluation of thread-coarsening Alberto Magni, Christophe Dubach, Michael F. P. O'BoyleSC13link
425/02/2015Gheorghe-Teodor Bercea2015A New Sparse Matrix Vector Multiplication GPU Algorithm Designed for Finite Element ProblemsJonathan Wong, Ellen Kuhl, Eric DarvearXivlink
311/02/2015Fabio Luporini2014A basic linear algebra compilerDaniele G. Spampinato and Markus PüschelCGOlink
228/01/2015Emanuele Vespa 2012 Decoupling algorithms from schedules for easy optimization of image processing pipelinesJonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe and Frédo DurandSIGGRAPHlink
114/01/2015Luigi Nardi2009Roofline: an insightful visual performance model for multicore architecturesWilliams, Samuel, Andrew Waterman, and David Patterson.Comm. of the ACM - A Direct Path to Dependable Software link