CURRENT
Post-petascale programming tools and applications, CREST, JST
Parallel GPU accelerated discrete optimization (Traveling Salesman Problem)
Logo - High Performance Parallel GPU-based TSP Solver (Screenshot)
GTC 2013 Slides
Previous Slides
![]()
Download latest source code (OSX, Linux - CUDA/OpenCL/SSE/AVX/Xeon Phi support v. 0.62 (03/04/2013)
Download Windows binary (CUDA/OpenCL, Multi-GPU support) (Video), based on the Linux code - (17/03/2013)
Source code is also available here: http://code.google.com/p/logo-tsp-solver/
Previous versions
CUDA/OpenCL/SSE/AVX/Xeon Phi - v. 0.61, v. 0.6
CUDA/OpenCL - v. 0.52, v. 0.51, v. 0.5, v. 0.4, v. 0.31, v. 0.3
CUDA + OpenGL Windows demo application (as presented at SC12)
CPU and GPU Benchmarks (v. 0.5) - I use exactly the same code for CPU and GPU!
![]()
![]()
Left: 2 x Tesla K20c, 2 x GTX 680, GTX 690 + 2x Radeon 7970, Right: 2 x Radeon 6990 + Radeon 5970
sw24798.tsp 24h Test:
GPU Performance Comparison
usa13509.tsp on TSUBAME 2.0 with MPI inter-node communication:
FlopsCUDA - CUDA Benchmarking Tool (with Kepler GPU support)
Download : Linux source (Screenshot)
FlopsCL - OpenCL Benchmarking Tool
Download : Windows executable (Screenshot), Linux source (Screenshot)
PAST
ULP-HPC: Ultra Low-Power, High-Performance Computing via Modeling and Optimization of Next Generation HPC Technologies, CREST, JST
Ph.D. Course
Implementation and analysis of large scale parallel tree searching algorithms on GPU - Minimax, MCTS (Monte Carlo Tree Search) using CUDA and MPI. Analysis of power-usage related issues.
The motivation behind this work was caused by the emerging GPU-based supercomputer systems and their high computational potential combined with relatively low power usage compared to CPUs. As a problem to be solved I chose an AI GPU-based agent in the game of Reversi (Othello) which provides a sufficiently complex problem for tree searching with non-uniform structure.
The research covered areas such as: Artificial Intelligence, Tree Search, Monte Carlo/Random methods, Parallel processing, General Purpose GPU Programming (GPGPU)
M.Sc. Course
Study on real-time image processing. Implementation of several algorithms within CUDA platform (Edge detection, Histogram calculation, Segmentation, Noise reduction) and integration with OpenCV. The main goal of the work was to improve the performance of feature extraction process from video signal while keeping the existing interface untouched.
B.Sc. Course
Implementation and integration of an ART neural network class in C++ for use with the OpenCV library. The purpose of the work was an application able to recognize and locate simple objects based on their color and shape. The features obtained by OpenCV’s built-in routines were processed by the neural network. The application was part of a larger servomechanism system controlling robot's arm movement based on the object's location.











