Resumen
CALIDAD/CATEGORÍA | CANTIDAD DE ARTÍCULOS PUBLICADOS |
JCR Q1 | 1 |
JCR Q2 | 2 |
JCR Q3 | 1 |
TOTAL papers JCR | 4 |
Artículos no indexados en JCR | 1 |
Año 2023
Título: UVaFTLE: Lagrangian finite time Lyapunov exponent extraction for fluid dynamic applications
Autor(es): Rocío Carratalá-Sáez, Yuri Torres, Jose Sierra-Pallares, Sergio López-Huguet, Diego R. Llanos
Revista: Journal of Supercomputing
JCR quatile: Q2
DOI: https://doi.org/10.1007/s11227-022-05017-x
Resumen: The determination of Lagrangian Coherent Structures (LCS) is becoming very important in several disciplines, including cardiovascular engineering, aerodynamics, and geophysical fluid dynamics. From the computational point of view, the extraction of LCS consists of two main steps: The flowmap computation and the resolution of Finite Time Lyapunov Exponents (FTLE). In this work, we focus on the design, implementation, and parallelization of the FTLE resolution. We offer an in-depth analysis of this procedure, as well as an open source C implementation (UVaFTLE) parallelized using OpenMP directives to attain a fair parallel efficiency in shared-memory environments. We have also implemented CUDA kernels that allow UVaFTLE to leverage as many NVIDIA GPU devices as desired in order to reach the best parallel efficiency. For the sake of reproducibility and in order to contribute to open science, our code is publicly available through GitHub. Moreover, we also provide Docker containers to ease its usage.
Año 2021
Título: Leveraging teaching on demand: Approaching HPC to undergrads
Autor(es): Sandra Catalán, Rocío Carratalá-Sáez, Sergio Iserte
Revista: Journal of Parallel and Distributed Computing
JCR quatile: Q1
DOI: https://doi.org/10.1016/j.jpdc.2021.05.015
Resumen: High Performance Computing (HPC) is a highly demanded discipline in companies and institutions. However, as students and also afterwards as professors, we observed a lack of HPC related content in the engineering degrees at our university, including Computer Science. Thus, we designed and offered the engineering students a non-mandatory course entitled “Build your own cluster employing Raspberry Pi” to provide the students with HPC skills. With this course, we covered the basics of supercomputing (hardware, networking, software tools, performance evaluation, cluster management, etc.). This was possible thanks to leveraging the flexibility and versatility of Raspberry Pi devices, and the students’ motivation that arose from the hands-on experience. Moreover, the course included a “Teaching on demand” component to let the attendees choose a field to explore, based on their own interests. In this paper, we offer all the details to let anyone fully reproduce the course. Besides, we analyze and evaluate the methodology that let us fulfill our objectives: increase the students’ HPC skills and knowledge in such a way that they feel capable of utilizing it in their mid-term professional career.
Año 2019
Título: Exploiting Nested Task-Parallelism in the H-LU Factorization
Autor(es): Rocío Carratalá-Sáez, Sven Christophersen, José I. Aliaga, Vicenç Beltran, Steffen Börm, Enrique S.Quintana-Ortí
Revista: Journal of Computational Science
JCR quatile: Q2
DOI: https://doi.org/10.1016/j.jocs.2019.02.004
Resumen: We address the parallelization of the LU factorization of hierarchical matrices (ℋ-matrices) arising from boundary element methods. Our approach exploits task-parallelism via the OmpSs programming model and runtime, which discovers the data-flow parallelism intrinsic to the operation at execution time, via the analysis of data dependencies based on the memory addresses of the tasks’ operands. This is especially challenging for ℋ-matrices, as the structures containing the data vary in dimension during the execution. We tackle this issue by decoupling the data structure from that used to detect dependencies. Furthermore, we leverage the support for weak operands and early release of dependencies, recently introduced in OmpSs-2, to accelerate the execution of parallel codes with nested task-parallelism and fine-grain tasks. As a result, we obtain a significant improvement in the parallel performance with respect to our previous work.
Año 2018
Título: Dynamic look-ahead in the reduction to band form for the singular value decomposition
Autor(es): Andrés E. Tomás, Rafael Rodríguez-Sánchez, Sandra Catalán, Rocío Carratalá-Sáez, Enrique S.Quintana-Ortí
Revista: Journal of Parallel Computing
JCR quatile: Q3
DOI: https://doi.org/10.1016/j.parco.2018.11.001
Resumen: We investigate the introduction of look-ahead in two-stage algorithms for the singular value decomposition (SVD). Our approach relies on a specialized reduction for the first stage that produces a band matrix with the same upper and lower bandwidth instead of the conventional upper triangular-band matrix. In the case of a CPU-GPU server, this alternative form accommodates a static look-ahead into the algorithm in order to overlap the reduction of the “next” panel on the CPU and the “current” trailing update on the GPU. For multicore processors, we leverage the same compact form to formulate a version of the algorithm that advances the reduction of “future” panels, yielding a dynamic look-ahead that overcomes the performance bottleneck that the sequential panel factorization represents.
Año 2017
Título: Parallel Solution of Hierarchical Symmetric Positive Definite Linear Systems
Autor(es): José I. Aliaga, Rocío Carratalá-Sáez, Enrique S. Quintana-Ortí
Revista: Applied Mathematics and Nonlinear Sciences
DOI: https://doi.org/10.21042/AMNS.2017.1.00017
Resumen: We present a rototype task-parallel algorithm for the solution of hierarchical symmetric positive definite linear systems via the ℋ-Cholesky factorization that builds upon the parallel programming standards and associated runtimes for OpenMP and OmpSs. In contrast with previous efforts, our proposal decouples the numerical aspects of the linear algebra operation from the complexities associated with high performance computing. Our experiments make an exhaustive analysis of the efficiency attained by different parallelization approaches that exploit either task-parallelism or loop-parallelism via a runtime. Alternatively, we also evaluate a solution that leverages multi-threaded parallelism via the parallel implementation of the Basic Linear Algebra Subroutines (BLAS) in Intel MKL.