2016 - 2018      | Large Scale Content Addressable Memory
2015 - 2016      | Recurrent Memory Array Structures
2015 - 2016      | DARPA Saccadic Vision Project
2013 - 2014      | Hierarchical Temporal Memory
2012 - 2013      | Parallel Stochastic Local Optimization
2008 - 2011      | Large Scale Parallel Monte Carlo Tree Search
2006 - 2008      | Real-time data processing on GPU
2005 - 2007      | Robotic Arm Manipulation using ARTMAP neural network
2018 - | Nintendo Learning Environment
http://olab.is.s.u-tokyo.ac.jp/~kamil.rocki/nintendo/
New environment for accelerating research in AI, targeting transfer and meta learning
Fastest game emulator in the world (CPU, GPU and FPGA versions), Support for over 1000 games in total
Visualization of 3D manifold created by running Mario and Tetris through a Variational Autoencoder (a fully unsupervised approach), the Nintendo environment is designed to run experiments on multiple games at once
Tetris game reconstructed from the internal world representation (as in Ha et al. 2018), this enables 'artificial imagination'
2018 - | GAMEBOY Supercomputer
On Dec 19th 2018, the main page of VICE motherboard
A $2M FPGA-based supercomputer, 1296 nodes connected in a 3D mesh, co-designed HW + SW
1B+ frames per second aggregate, over 200x speedup vs Xeon CPU
Demonstrated practical usage using existing RL algorithm using distributed asynchronous advantage actor critic algorithm (A3C)
Trained using the system, Pacman (LEFT), Mario(RIGHT)
2017 - | AISC - Adaptive Instruction Set Computer
A RISC-V CPU architecture has been modified to give the CPU an ability to shape its own architecture, based on the reward collected from the environment (correct and fast execution)
10x faster than Xeon at 1/100th the power and footprint
Architecture evolves from thousands of runs
Self-evolved architecture is capable of learning simple algorithmic tasks (copy, sort)
2016 - 2018 | Large Scale Content Addressable Memory
High-dimensional data is mapped into physical 3D memory space (GAN-based) and used with a memory-augmented neural network in order to provide very fast key-value lookups
An illustration of real-time encoding and decoding data location in latent 3D space
2016 | L2P2 - Low Level Parallel Primitives
Auto generation of machine code based on high level structures and LLVM backends
A programmer specifies general description (LEFT) and the program is executed millions of times resulting in the best version (RIGHT: Auto-tuning)
2015-2016 | Recurrent Memory Array
Improvements to existing models: stochastic memory array (LEFT), surprisal-driven zoneout (RIGHT)
Link to the paper shown at NIPS 2016
Illustration of the zoneout driven by misprediction, LEFT: Information content (surprisal) during predicting wikipedia text, RIGHT: use high surprisal to gate the memory cell, possibly sparsify usage during predictable passages
Visualization of activity in the LSTM network, induced sparsity in activations, No zoneout (LEFT), With zoneout (RIGHT)
Hidden state
Memory cell
Objectively works too, improving SOTA
2015-2016 | DARPA Saccadic Vision Project
Image recognition using sequences of 2D patches as inputs
Observation trajectory depends on he task (Yarbus et al. 1967)
Unsupervised learning of spatial pattern and image saliency map
Unsupervised training using RBM and LSTM builds predictive model (marked as SDR/memory sequence in the animation), RL-based fine-tuning for classification (probs)
2013-2014 | Hierarchical Temporal Memory
This is a demo of my implementation of HTM running over a million neurons and tens of millions of synapses. Left: predicion task, Right: visualizaion of activations in real-time
My work has been used as a flagship project of the Machine Intelligence Group at IBM Research
2012-2013 | Parallel Stochastic Local Optimization
Traveling Salesman Problem (TSP) is a known NP-hard problem in combinatorial optimization
I built the fastest GPU-based TSP Solver
Presented my work at multiple venues, including Supercomputing conference, IPDPS, GECCO, GTC
I published an open source version of the solver, which runs on CPU and GPU, github link
Proposed many algorithmic innovations applicable to wide range of irregular problems, a poster from Supercomputing 2012:
An issue of TSUBAME journal with my work
HPC Wire article on my research
2008-2011 | Large Scale Parallel Monte Carlo Tree Search
1. A multi-GPU enchanced version of highly irregular MCTS
2. Used TSUBAME2 supercomputer (2048 CPUs + 256 GPUs, 3M threads)
3. Self-play based learning
2006-2008 | Real-time data processing on GPU
Repurposed OpenGL graphics pipeline for general-purpose computation (2 years before NVIDIA CUDA)
1.Edge Detection
2.Segmentation
3.Denoising
2005-2007 | Robotic Arm Manipulation using ARTMAP neural network
Used an ARTMAP fuzzy neural net to guide a robotic arm, mapping pixels to torque
1. Tracking
2. Robustness
3. Recognition Heatmaps