Final Work Product Submission Report (Google Summer of Code 2020)

Project Details

Student: Hidayat Ullah Khan
Github: @masterchef2209
Project Title: SIMDe
Organisation: Open Bioinformatics Foundation
LinkedIn: Hidayat Khan
Mentors: Evan Nemerson, Michael R Crusoe, Jun Aruga

Project Link: https://summerofcode.withgoogle.com/projects/#5333259434590208

About SIMDe

“The SIMDe header-only library provides fast, portable implementations of SIMD intrinsics on hardware which doesn’t natively support them, such as calling SSE functions on ARM. There is no performance penalty if the hardware supports the native implementation (e.g., SSE/AVX runs at full speed on x86, NEON on ARM, etc.).” (From SIMDe readme)

Summary of Work

I started my work with writing portable implementations for AVX-512F/BW intrinisics and implemented over 100 intrinisics along with generating test cases, AVX-512 has introduced many new intrinisics to make up for drawbacks of previous releases and also mask{,z} versions for many intrinisics. AVX-512 also provides access to intrinisics which use 512 bit registers and doubles the width of the register compared to its predecessors releases. It can accelerate performance especially in use cases like scientific simulations, AI/deep learning, audio/video processing etc. (issue#104, issue#101)

After that I worked on writing implementations for SSE4.2 intrinisics. SSE4.2 contains instructions that deal with string and text operations which can be used to accelerate string library functions, XML processing etc, along with CRC intrinisics. The complete details of my work regarding SSE4.2 intrinisics can be found in this blog. (issue#7)

Following that I worked on portable implementations for AVX2 intrinisics and successfully completed the implementation for all the remaining intrinisics. (issue#9)

The last 5 weeks of my GSoC work period were spent on writing NEON fallbacks for SSE, SSE2, SSE4.1, SSE4.2, x86 SIMD intrinisics. After I finished most of the NEON fallbacks I moved on to working on WASM WebAssembly implementations for x86 intrinisics. For testing NEON implementations I had to use QEMU tool which is available with debian to emulate and use ARM NEON intrinisics on intel machine. The complete details about testing NEON code can be found out in this blog. For WASM I had to test using emscripten. (issue#73, issue#86)

Blogs:

Guide to Intel SSE4.2 CRC intrinisics( + implementation for SIMDe)

Optimizing horizontal operation(h{add,sub}{,s}) intrinisics for SIMDe

Introduction to ARM NEON SIMD Intrinisics (+guide for SIMDe NEON impls.)

Code Links

Pull Requests created by me in SIMDe.

Commits pushed by me which are successfully merged in master branch of SIMDe.

Sample Codes

mm_cmpistrz intrinisic used for string manipulations (SSE4.2)

Deinterleave operations for optimization of horizontal operations

Generating test cases for mm_broadcastq_epi64

What is left to be done?

Currently 9 intrinisics of SSE4.2 is having complete implementation, but intrinisics of same class as mm_cmpestra belonging to the SSE4.2 release of x86 intrinisics are yet to be done. I have done enough brainstorming on this with my mentor but I was not able to complete these intrinisics due to lack of understanding of Intel’s documentation on my part and also these are very hard in general, if I get some hint on how to do this in future I will definitely get back to this.

While working on NEON implementations, I was able to complete NEON implementations for all remaining SSE3, SSSE3 intrinisics but for some of the SSE4.1, SSE4.2, SSE and SSE2 intrinisics NEON implementation is yet to be done, this is either due to the fact that there is simply no efficient NEON intrinisic available, for eg mm_cvttpd_epi32 lack equivalent NEON vcvt operation for f64_s32, or some of the intrinisics are complex enough, which makes it very hard, for example, mm_shuffle_ps has lots of cases for implementation due to different values of imm8. Complete list of x86 intrinisics which are yet to have NEON implementations. If I get some ideas regarding NEON implementations in future I will try to complete the remaining list.

Final Thoughts

GSoC has made this summer the best summer of my undergraduate journey, I have learnt the most in these months, apart from the increased knowledge about SIMD vector operations and different architectures, this program has enhanced my other skills as well whether it be related to diving into complex documentation, or reading someone else’s code, working with other developers in a team or my soft skills. I first started contributing to the project towards the end of february/start of march and I am glad to have this 6 month journey so far with the project. I am very thankful to all my mentors and especially Evan Nemerson for clearing all my doubts in weekly video calls on Jitsi :).

Currently I am working on WASM implementations for x86 instrinisics and I am hoping to continue working on it and keep contributing to the project in future, as well as help new contributors to get started with contributing to SIMDe.

Thank You

Final Work Product Submission Report (Google Summer of Code 2020)

Published by masterchef2209

One thought on “Final Work Product Submission Report (Google Summer of Code 2020)”

Leave a comment Cancel reply

Share this:

Related

Published by masterchef2209

One thought on “Final Work Product Submission Report (Google Summer of Code 2020)”

Leave a comment Cancel reply