acm-header
Sign In

Communications of the ACM

Blogroll


bg-corner

Computing the UTF-8 size of a Latin 1 string quickly (AVX edition)
From Daniel Lemire's Blog

Computing the UTF-8 size of a Latin 1 string quickly (AVX edition)

Computers represent strings using bytes. Most often, we use the Unicode standard to represent characters in bytes. The universal format to exchange strings online...

Science and Technology links (February 12 2023)
From Daniel Lemire's Blog

Science and Technology links (February 12 2023)

Kenny finds that the returns due to education are declining. Rich countries are spending more on education, with comparatively weaker test results. It costs more...

Bit Hacking (with Go code)
From Daniel Lemire's Blog

Bit Hacking (with Go code)

At a fundamental level, a programmer needs to manipulate bits. Modern processors operate over data by loading in ‘registers’ and not individual bits. Thus a programmer...

Serializing IPs quickly in C++
From Daniel Lemire's Blog

Serializing IPs quickly in C++

On the Internet, we often use 32-bit addresses which we serialize as strings such as 192.128.0.1. The string corresponds to the Integer address 0xc0800001 (3229614081...

Move or copy your strings? Possible performance impacts
From Daniel Lemire's Blog

Move or copy your strings? Possible performance impacts

You sometimes want to add a string to an existing data structure. For example, the C++17 template ‘std::optional’ may be used to represent a possible string value...

International domain names: where does https://meßagefactory.ca lead you?
From Daniel Lemire's Blog

International domain names: where does https://meßagefactory.ca lead you?

Originally, the domain part of a web address was all ASCII (so no accents, no emojis, no Chinese characters). This was extended a long time ago thanks to something...

Year 2022: Scientific progress
From Daniel Lemire's Blog

Year 2022: Scientific progress

The year 2022 is over. As with every year that passes, we have made some scientific progress. I found the following achievements interesting: Diluting the blood...

Science and technology links (January 15 2022)
From Daniel Lemire's Blog

Science and technology links (January 15 2022)

For under $600, one can buy a 20-terabyte disk on Amazon. Unless you work professionally in multimedia, it is more storage than you need. However, having much storage...

Care is needed to use C++ std::optional with non-trivial objects
From Daniel Lemire's Blog

Care is needed to use C++ std::optional with non-trivial objects

We often have to represent in software a value that might be missing. Different programming languages have abstraction for this purpose. A recent version of C++...

Transcoding Unicode with AVX-512: AMD Zen 4 vs. Intel Ice Lake
From Daniel Lemire's Blog

Transcoding Unicode with AVX-512: AMD Zen 4 vs. Intel Ice Lake

Most systems today rely on Unicode strings. However, we have two popular Unicode formats: UTF-8 and UTF-16. We often need to convert from one format to the other...

Emojis in domain names, punycode and performance
From Daniel Lemire's Blog

Emojis in domain names, punycode and performance

Most domain names are encoded using ASCII (e.g., yahoo.com). However, you can register domain names with almost any character in them. For example, there is a web...

Quickly checking that a string belongs to a small set
From Daniel Lemire's Blog

Quickly checking that a string belongs to a small set

Suppose that I give you a set of reference strings (“ftp”, “file”, “http”, “https”, “ws”, “wss”). Given a new string, you want to quickly tell whether it is part...

Science and Technology links (December 25 2022)
From Daniel Lemire's Blog

Science and Technology links (December 25 2022)

One of Elon Musk’s ventures, OpenAI, made public a new tool called ChatGPT. It is widely regarding as a practical breakthrough in artificial intelligence. Given...

Fast base16 encoding
From Daniel Lemire's Blog

Fast base16 encoding

Given binary data, we often need to encode it as ASCII text. Email and much of the web effectively works in this manner. A popular format for this purpose is base64...

The size of things in bytes
From Daniel Lemire's Blog

The size of things in bytes

storing 1 GiB/month on the cloud 0.02$US web site of my twitter profile (@lemire), HTML alone 296 KiB web site of my twitter profile (@lemire), all data 3.9 MiB...

Checking for the absence of a string, naive AVX-512 edition
From Daniel Lemire's Blog

Checking for the absence of a string, naive AVX-512 edition

Suppose you would like to check that a string is not present in a large document. In C, you might do the following using the standard function strstr: bool is_present...

What is the memory usage of a small array in C++?
From Daniel Lemire's Blog

What is the memory usage of a small array in C++?

In an earlier blog post, I reported that the memory usage of a small byte array in Java (e.g., an array containing 4 bytes) was about 24 bytes. In other words:Continue...

Science and Technology links (December 11 2022)
From Daniel Lemire's Blog

Science and Technology links (December 11 2022)

As we focus on some types of unfortunate discrimination (race, gender), we may become blind to other types of discrimination. For example, tend to discrimate against...

Fast midpoint between two integers without overflow
From Daniel Lemire's Blog

Fast midpoint between two integers without overflow

Let us say that I ask you to find the number I am thinking about between -1000 and 1000, by repeatedly guessing a number. With each guess, I tell you whether your...

Optimizing compilers reload vector constants needlessly
From Daniel Lemire's Blog

Optimizing compilers reload vector constants needlessly

Modern processors have powerful vector instructions which allow you to load several values at once, and operate (in one instruction) on all these values. Similarly...
Sign In for Full Access
» Forgot Password? » Create an ACM Web Account