The last RISC-V core announced by SiFive was the U8-Series out-of-order RISC-V Core IP that aims to compete against Arm Cortex-A72 Core. But in their latest announcement, the company built upon the 64-bit RISC-V U7-series with the SiFive Intelligence X280 multi-core, Linux capable RISC-V processor adding vector extensions and SiFive Intelligence Extensions, and optimized for AI/ML compute at the edge. SiFive Intelligence X280 key features: 64-bit RISC-V ISA with 8-stage dual-issue in-order pipeline, coherent multi-core, Linux capable based on U7 series core. SiFive Intelligence Extensions for ML workloads – BF16/FP16/FP32/FP64, int8 to 64 fixed-point data types 512-bit vector register length – Variable-length operations, up to 512-bits of data per cycle High-performance vector memory subsystem Memory parallelism provides cache miss tolerance Virtual memory support with precise exceptions Up to 48-bit addressing SiFive Intelligence includes software solutions to leverage the X280’s features and provide “great AI inference performance” using TensorFlow Lite. No […]
Raspberry Pi SBC Now Supports OpenVX 1.3 Computer Vision API
OpenVX is an open, royalty-free API standard for cross-platform acceleration of computer vision applications developed by The Khronos Group that also manages the popular OpenGL ES, Vulkan, and OpenCL standards. After OpenGL ES 3.1 conformance for Raspberry Pi 4, and good progress on the Vulkan implementation, the Raspberry Pi Foundation has now announced that both Raspberry Pi 3 and 4 Model B SBC’s had achieved OpenVX 1.3 conformance (somehow dated 2020-07-23). Raspberry Pi OpenVX open-source sample implementation passes the Vision, Enhanced Vision, & Neural Net conformance profiles specified in OpenVX 1.3 standard. However, it is NOT intended to be a reference implementation, as it is not optimized, production-ready, nor actively maintained by Khronos publically. The sample can be built on multiple operating systems (Windows, Linux, Android) using either CMake or Concerto. Detailed instructions are provided for Ubuntu 18.04 64-bit x86 and Raspberry Pi SBC. Here’s the list of commands to […]
Optimizing JPEG Transformations on Qualcomm Centriq Arm Servers with NEON Instructions
Arm servers are already deployed in some datacenters, but they are pretty new compared to their Intel counterparts, so at this stage software may not always be optimized as well on Arm as on Intel. Vlad Krasnow working for Cloudflare found one of those unoptimized cases when testing out Jpegtran – a utility performing lossless transformation of JPEG files – on one of their Xeon Silver 4116 Server:
1 2 3 4 5 |
vlad@xeon:~$ time ./jpegtran -outfile /dev/null -progressive -optimise -copy none test.jpg real 0m2.305s user 0m2.059s sys 0m0.252s |
and comparing it to one based on Qualcomm Centriq 2400 Arm SoC:
1 2 3 4 5 |
vlad@arm:~$ time ./jpegtran -outfile /dev/null -progressive -optimise -copy none test.jpg real 0m8.654s user 0m8.433s sys 0m0.225s |
Nearly four times slower on a single core. Not so good, as the company aims for at least 50% of the performance since the Arm processor has double the number of cores. Vlad did some optimization on The Intel processor using SSE instructions before, so he decided to look into optimization the Arm code with NEON instructions instead. First step was to check which functions may slowdown the […]
How ARM Nerfed NEON Permute Instructions in ARMv8
This is a guest post by blu about an issue he found with a specific instruction in ARMv8 NEON. He previously wrote an article about OpenGL ES development on Ubuntu Touch, and one or two other posts. This is not a happy-ending story. But as with most unhappy-ending stories, this is a story with certain moral for the reader. So read on if you appreciate a good moral. Once upon a time there was a very well-devised SIMD instruction set. Its name was NEON, or formally — ARM Advanced SIMD — ASIMD for short (most people still called it NEON). It was so nice, that veteran coders versed in multiple SIMD ISAs often wished other SIMD ISAs were more like NEON. NEON had originated as part of the larger ARM ISA version 7, or ARMv7, for short. After much success in the mobile and embedded domains, ARMv7 was superseded by […]
Open Source ARM Compute Library Released with NEON and OpenCL Accelerated Functions for Computer Vision, Machine Learning
GPU compute promises to deliver much better performance compared to CPU compute for application such a computer vision and machine learning, but the problem is that many developers may not have the right skills or time to leverage APIs such as OpenCL. So ARM decided to write their own ARM Compute library and has now released it under an MIT license. The functions found in the library include: Basic arithmetic, mathematical, and binary operator functions Color manipulation (conversion, channel extraction, and more) Convolution filters (Sobel, Gaussian, and more) Canny Edge, Harris corners, optical flow, and more Pyramids (such as Laplacians) HOG (Histogram of Oriented Gradients) SVM (Support Vector Machines) H/SGEMM (Half and Single precision General Matrix Multiply) Convolutional Neural Networks building blocks (Activation, Convolution, Fully connected, Locally connected, Normalization, Pooling, Soft-max) The library works on Linux, Android or bare metal on armv7a (32bit) or arm64-v8a (64bit) architecture, and makes use […]
Linaro Connect Hong Kong 2015 Schedule and Demos
Linaro Connect Hong Kong 2015 will take place on February 9 – 13,2015 in Hong Kong, and the organization has released the schedule for the five days events with keynotes, sessions, and demos. Each day will start with the keynote including speakers such as: George Grey, Linaro CEO, who will welcome attendees to Linaro Connect, and provide an update on the latest Linaro developments Jon Masters, Chief ARM Architect, Redhat, who will present Red Hat update and latest ARMv8-A demonstrations Dejan Milojicic, Senior Researcher & Manager, HP Labs Bob Monkman, Enterprise Segment Marketing Manager, ARM, will discuss about the impact of ARM in next generation cloud and communication network infrastructure Greg Kroah-Hartman, Linux Foundation Fellow, will introduce the Greybus Project (Linux for Project Ara modular phones) Warren Rehman, Android Partner Engineering Manager, Google The agenda also features sessions covering Android, ARMv8-A, Automation & Validation, Digital Home, Enterprise Servers, LAVA, Linux […]
Linaro 13.08 Release With Linux Kernel 3.11 and Android 4.3
Linaro 13.08 has been released with Linux Kernel 3.11-rc6 (stating), Kernel 3.10.9 (LSK – beta), and Android 4.3. This month is the first release based on Android 4.3, which was only pushed to AOSP at the end of last month. I can also see work on new SoCs/hardware this month with Texas Instruments Keystone II ARM Cortex A15+DSP SoC and Fujitsu AA9 board (Which processor?, I could not find out). A lot of work also appears to have gone in OpenEmbedded, further optimizations have gone into NEON optimized AES encryption in OpenSSL, and more. It’s also the first time I can see a Ubuntu Raring engineering build image for HighBank (Calxeda Energycore). Here are the highlights of this release: Android Engineering Android stack was tuned to achieve 100% CTS pass result on Android 4.3 Analyzing the UEFI EDK II boot loader for Android completed, implementation of fastboot application and USB […]
ARM Releases Ne10: An Open Source Library with NEON Optimized Functions
The Advanced SIMD extension (aka NEON or “MPE” Media Processing Engine) is a combined 64- and 128-bit single instruction multiple data (SIMD) instruction set that provides standardized acceleration for media and signal processing applications for ARM Cortex-A (ARMv7) processors and the goal of these instructions is similar to MMX, SSE, and 3DNow! extensions for x86 processors. Starting early 2011, ARM has been working internally on a project codenamed Snappy to develop common functions accelerated by NEON. They have now released the first version of Snappy, now called the Ne10 library, which is available on GitHub at https://github.com/projectNe10/Ne10 . The code has been developed in C and Assembler and tested on Ubuntu on ARM (Linaro). A Makefile is also included to build it for Android (AOSP). The current functions include vector and matrix operations accelerated by NEON instructions. Since the library is open source, ARM hopes developers will make use of the Ne10 […]