This repository contains examples and explanations of different parallel computing techniques and frameworks, including OpenMP, CUDA, Java Parallel Executor Fork/Join, and Python multiprocessing.
OpenMP is an API for parallel programming in shared-memory systems, primarily used with C, C++, and Fortran. It provides a set of compiler directives, library routines, and environment variables that enable parallelism in loops, sections, and tasks.
Here's a simple "Hello, World!" program using OpenMP in C++:
#include <iostream>
#include <omp.h>
int main() {
// Enable OpenMP parallelism
#pragma omp parallel
{
// Get the total number of threads
int num_threads = omp_get_num_threads();
// Get the thread ID
int thread_id = omp_get_thread_num();
// Print "Hello, World!" with thread information
std::cout << "Hello, World! Thread ID: " << thread_id << " Total threads: " << num_threads << std::endl;
}
return 0;
}
g++ -o hello_world hello_world.cpp -fopenmp
./hello_world
Hello, World! Thread ID: Hello, World! Thread ID: 8 Total threads: 12Hello, World! Thread ID: 11 Total threads: 12
Hello, World! Thread ID: 1 Total threads: 12
Hello, World! Thread ID: 2 Total threads: 12
Hello, World! Thread ID: 9 Total threads: 12
Hello, World! Thread ID: 5 Total threads: 12
Hello, World! Thread ID: 7 Total threads: 12
4 Total threads: 12
Hello, World! Thread ID: 3 Total threads: Hello, World! Thread ID: 6 Total threads: 12Hello, World! Thread ID: 0 Total threads: 12
12
Hello, World! Thread ID: 10 Total threads: 12
- Parallelizing computationally intensive loops such as matrix multiplication, image processing, or numerical simulations.
- Exploiting multicore CPUs for accelerating applications that can be parallelized.
CUDA is a parallel computing platform and programming model developed by NVIDIA for accelerating computations on NVIDIA GPUs. It allows developers to write C/C++ code and execute it on GPU devices, enabling massively parallel processing.
- Accelerating scientific simulations, deep learning training, and data analytics tasks by offloading compute-intensive portions to the GPU.
- Real-time processing of large datasets, such as image and video processing.
Java Parallel Executor Fork/Join is a framework introduced in Java 7 for parallel programming. It provides a simple API for parallelizing tasks by recursively dividing them into smaller subtasks and combining their results.
- Recursive algorithms like quicksort, mergesort, and tree traversals can be parallelized effectively using Fork/Join.
- Performing parallel processing of large collections or arrays in data-intensive applications.
Python Multiprocessing is a module in the Python Standard Library that supports parallel execution of tasks using processes. It allows Python programs to utilize multiple CPU cores for concurrent processing.
- Parallelizing CPU-bound tasks such as numerical computations, data processing, and machine learning training.
- Improving the performance of I/O-bound tasks by offloading them to separate processes, such as web scraping and file processing.