| The recent shift toward multi-core chips has pushed the burden of extracting performance to the programmer. In fact, programmers now have to be able to uncover more coarse-grain parallelism with every new generation of processors, or the performance of their applications will remain roughly the same or even degrade. Unfortunately, parallel programming is still hard and error prone. This has driven the emergence of many new programming models that aim to make this process efficient. Transactional Memory is an attracting alternative parallel programming model. From a different perspective, it simplifies parallel programming by removing the burden of correctly synchronizing threads on data races. This model allows programmers to write parallel code as transactions, which are then guaranteed by the runtime system to execute atomically and in isolation regardless of eventual data races. Although removing the burden of correctly synchronizing parallel applications is an important simplification, the programmer is still left with the tasks of thread scheduling and orchestration. These tasks can be naturally handled by skeleton or pattern-based programming. It allows parallel programs to be expressed as specialized instances of generic communication and computation patterns. This leaves the programmer with only the implementation of the particular operations required to solve the problem at hand. Thus, this programming approach eliminates some of the major challenges of parallel programming, namely thread communication and orchestration. In addition to simplifying the programming task, skeletons are also amenable to performance optimizations. I am currently working on a new skeleton framework that selects and applies performance optimizations in transactional worklist applications. It uses a novel hierarchical autotuning mechanism that dynamically selects the most suitable set of optimizations for each application and adjusts them accordingly. Additionally, I am also investigating the performance impact of existing system-level optimizations when applied to transactional worklist applications. |