qr_mumps
Table of Contents
Overview
qr_mumps is a software package for the solution of sparse, linear systems on multicore computers. It implements a direct solution method based on the \(QR\) factorization of the input matrix. Therefore, it is suited to solving sparse least-squares problems and to computing the minimum-norm solution of sparse, underdetermined problems. It can obviously be used for solving square problems in which case the stability provided by the use of orthogonal transformations comes at the cost of a higher operation count with respect to solvers based on, e.g., the LU factorization. qr_mumps supports real and complex, single or double precision arithmetic.
As in all the sparse, direct solvers, the solution is achieved in three distinct phases:
- Analysis: in this phase an analysis of the structural properties of the input matrix is performed in preparation for the numerical factorization phase. This includes computing a column permutation which reduces the amount of fill-in coefficients (i.e., nonzeroes introduced by the factorization). This step does not perform any floating-point operation and is, thus, commonly much faster than the factorization and solve (depending on the number of right-hand sides) phases.
- Factorization: at this step, the actual QR factorization is computed. This step is the most computationally intense and, therefore, the most time consuming.
- Solution: once the factorization is done, the factors can be used
to compute the solution of the problem through two operations:
- Solve : this operation computed the solution of the triangular system \(Rx=b\) or \(R^Tx=b\)
- Apply : this operation applies the Q orthogonal matrix to a vector, i.e., \(y=Qx\) or \(y=Q^Tx\).
These three steps have to be done in order but each of them can be performed multiple times. If, for example, the problem has to be solved against multiple right-hand sides (not all available at once), the analysis and factorization can be done only once while the solution is repeated for each right-hand side. By the same token, if the coefficients of a matrix are updated but not its structure, the analysis can be performed only once for multiple factorization and solution steps.
qr_mumps is a parallel, multithreaded software based on the StarPU runtime engine (older versions based on OpenMP are still available for download) and currently runs on multicore or, more generally, shared memory multiprocessor computers. qr_mumps does not run on distributed memory (e.g. clusters) parallel computers or GPUs. Parallelism is achieved through a decomposition of the workload into fine-grained computational tasks which basically correspond to the execution of a BLAS or LAPACK operation on blocks. It is strongly recommended to use sequential BLAS and LAPACK libraries and let qr_mumps have full control of the parallelism. Parallelism is achieved by dividing the workload into fine grained tasks that are arranged in a Direct Acyclic Graph (DAG). The execution of these tasks is guiged by an asynchronous and dynamic data-flow programming model which provides high efficiency and scalability.
qr_mumps is built upon the large knowledge base and know-how developed by the members of the MUMPS project. However, qr_mumps does not share any code with the MUMPS package and it is a completely independent software. qr_mumps is developed and maintained effort by the APO team of the IRIT laboratory of Toulouse and the HiePACS team at Inria Bordeaux, France.
Features:
- Task-based parallelism based on the StarPU runtime system
- Orderings: qr_mumps supports COLAMD, Scotch and Metis fill-reducing orderings. Orderings can also be explicitly provided through the qr_mumps API.
- Memory constrained execution: the memory consumption of the parallel code can be bound by any (small) multiple of the sequential execution.
- C interface: qr_mumps is written in Fortran 2008 but has a portable C interface.
Download
Credits
- Emmanuel Agullo (HiePACS, Inria Sud-Ouest)
- Alfredo Buttari (APO, CNRS-IRIT)
- Abdou Guermouche (HiePACS, Universite de Bordeaux)
- Florent Lopez (RAL)
Acknowledgments
- we wish to thank the MUMPS team and, especially Patrick Amestoy and Jean-Yves L'Excellent for sharing with us their deep knowledge of sparse direct solvers and parallel computing.
- we wish to thank the StarPU team for their help in using the runtime engine and for their responsiveness in fixing bugs (very few) and adding features.
- qr_mumps is part of the SOLHAR project
Publications
Articles in journals or conference proceedings
[1] | Emmanuel Agullo, George Bosilca, Alfredo Buttari, Abdou Guermouche, and Florent Lopez. Exploiting a Parametrized Task Graph model for the parallelization of a sparse direct multifrontal solver. In Euro-Par 2016: Parallel Processing Workshops, 2016. To appear. |
[2] | Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, and Florent Lopez. Implementing multifrontal sparse solvers for multicore architectures with Sequential Task Flow runtime systems. ACM Transactions On Mathematical Software, 2016. To appear. |
[3] | Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, Julien Herrmann, Suraj Kumar, Loris Marchal, and Samuel Thibault. Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms. In Heterogeneity in Computing Workshop 2015, Hyderabad, India, May 2015. [ http | .pdf ] |
[4] | Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, and Florent Lopez. Task-based multifrontal QR solver for GPU-accelerated multicore architectures. In HiPC, pages 54--63. IEEE Computer Society, 2015. |
[5] | L. Stanisic, E. Agullo, A. Buttari, A. Guermouche, A. Legrand, F. Lopez, and B. Videau. Fast and accurate simulation of multithreaded sparse linear algebra solvers. In Parallel and Distributed Systems (ICPADS), 2015 IEEE 21st International Conference on, pages 481--490, Dec 2015. [ DOI ] |
[6] | Alfredo Buttari. Fine-grained multithreading for the multifrontal QR factorization of sparse matrices. SIAM Journal on Scientific Computing, 35(4):C323--C345, 2013. [ arXiv | http ] |
[7] | Alfredo Buttari. Fine granularity sparse QR factorization for multicore based systems. In Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2, PARA'10, pages 226--236, Berlin, Heidelberg, 2012. Springer-Verlag. [ http ] |
This file was generated by bibtex2html 1.98.
Talks and Technical reports
[1] | Emmanuel Agullo, Olivier Aumage, George Bosilca, Berenger Bramar, Alfredo Buttari, Olivier Coulaud, Eric Darve, Jack Dongarra, Mathieu Faverge, Nathalie Furmento, Luc Giraud, Abdou Guermouche, Julien Langou, Florent Lopez, Hatem Ltaief, Samuel Pitoiset, Florent Pruvost, Marc Sergent, Samuel Thibault, and Stanimire Tomov. Matrices over runtime systems at exascale, 2015. Poster at the SuperComputing 2015 conference. |
[2] | Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, and Florent Lopez. Sparse direct solvers on top of a runtime system, 2015. Presentation at the SIAM Computational Science and Engineering international conference, Salt Lake City, March 14-18 2015. |
[3] | Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, and Florent Lopez. Task-based multifrontal qr solver for gpu-accelerated multicore architectures, 2015. Presentation at the Sparse Days in St Girons international conference, St Girons, June 29 - July 2 2015. |
[4] | Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, and Florent Lopez. Sparse direct solvers on top of a runtime system, 2014. Presentation at the PMAA 2014 international conference, Lugano July 2-4 2014. |
[5] | Alfredo Buttari. Fine grained sparse QR factorization for multicore systems, June 2011. Poster at The Householder Symposium 2011. |
This file was generated by bibtex2html 1.98.