Leading items

Welcome to the LWN.net Weekly Edition for March 16, 2023

This edition contains the following feature content:

Rules as code for more responsive governance: at the inaugural Everything Open conference, Pia Andrews makes the case for developing legal policies like we develop code.
An EEVDF CPU scheduler for Linux: the kernel's completely fair scheduler may be about to get a major update.
Heuristics for software-interrupt processing: developers consider how to improve one of the oldest designs in the kernel.
Interview: the FreeCAD Project Association: Yorik van Havre describes the creation and operation of the FreeCAD Project Association.
Zephyr: a modular OS for resource-constrained devices: an overview of a free operating system for small systems.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (1 posted)

Rules as code for more responsive governance

By Jake Edge
March 15, 2023

Everything Open

Using rules as code to help bridge the gaps between policy creation, its implementation, and its, often unintended, effects on people was the subject of a talk by Pia Andrews on the first day of the inaugural Everything Open conference in Melbourne, Australia. She has long been exploring the space of open government, and her talk was a report on what she and others have been working on over the last seven years. Everything Open is the successor to the long-running, well-regarded linux.conf.au (LCA); Andrews (then Pia Waugh) gave the opening keynote at LCA 2017 in Hobart, Tasmania, and helped organize the 2007 event in Sydney.

Andrews said that she has a dream of a world where government policy is built in a way that is accountable, participatory, humane, adaptive, and accessible. Those who are affected by these policies should be able to easily understand, apply, and question them; policies should not be written in some ivory tower, but should be created in conjunction with those who must follow them. She dreams of policies that are based on human values, rather than only on what is good for the economy, since relying solely on the latter has not worked out so well, she said; "make what's good for people and then the economy will follow". Rather than just writing policy once and "throwing it out in the ether and hoping", it should be iterated upon, so that even bad policy has a chance to become good based on looking at its actual impact on people. That description of her public-policy dream was met with a good bit of applause.

She knows that many in the audience are not "nearly as nerdy as I am when it comes to the public sector", so her slide deck was meant to be sort of a primer on the topic, with way more material than she would be able to cover and lots of links for further information. It takes a fundamental shift in thinking to decide that the government should be working for the people, rather than against them in some sense. In Australia, unlike most other countries, the government was set up as a penal colony, so it was punitive by its nature; Australians should keep that in mind.

The starting point is to recognize that the public sector can be changed to reflect the values and attributes from her dream. So many people are "cynical, frustrated, and skeptical", but maintaining that low expectation effectively means that "reality continues to deliver" to those expectations. Her hope for the talk is that attendees change their expectations and "help me change the world for the better".

Policy

"Policy is this weird word that means literally pretty much anything", she said with a chuckle. For some people it is a few bullet points on a napkin, for others it is a promise from the government of the day, for still others it is legislation. In fact, it is all those things, but for a technical audience, she likens it to an operating system for society: it determines the rules, what resources are available, what the support mechanisms are, and so on. It is a set of libraries that we draw upon every day without even noticing, for example when we drive our cars, take public transport, use the health and education systems, etc. There are lots of policies with regards to each of those things, which is "why it is so critical to get involved in it".

She put up a slide with a comic that was her "very cheeky way" of describing the policy process. In the first panel, the policy team is celebrating the completion of a policy and looking on to the next one, while the policy instructions get shoved into the "black hole of policy intent". The policy team wishes it could explore the policy future and its implementation, but the process in Australia and lots of other countries is not structured that way.

The policy instructions then get interpreted by one or more service-delivery teams in order to implement them, but the purpose behind the policy has been stripped away by the black hole. The team might be at a company that needs to implement the regulations or at a social-services organization that is creating a new public service; it is basically anyone who needs to follow the rules or to see them followed. Unfortunately, the policy instructions may not be clear and the policy team is off on the next thing, so the service-delivery team has to interpret the instructions as best it can; "there's a lot of guessing" going on, she said.

But people who are affected by the policy can have their lives ruined by errors and misapplications of the rules, as the recent Royal Commission formed to investigate the Robodebt scheme has found. There is a "black hole of policy impact" because no one knows what the actual impact of the policy is; those who are measuring the effects only look at the expected outcomes, not the unintended consequences. Eventually, some kind of full evaluation may be done, by an evaluation team—or a Royal Commission—which may find there was harm created by the policy.

In her mind, the policies should be regularly evaluated and the results would get fed back into the process to produce better policy. In the tech world, it is common to continuously test the code and to feed those results back to the developers, but that is not true in the policy world. So there is a "gap between policy and delivery" and it is quite large. But every policy today is being implemented using technology, which means that policy-development teams need to include technologists along with multiple other disciplines if the policies are going to do what is intended and minimize or eliminate the harm.

The policies should not just be looked at by an evaluation team, Andrews said—the public should be able to report bugs. In her dream world, the policy development team, the implementation teams, and the evaluators (including those affected) work together to co-design the policy instructions and to determine the success criteria; the human impacts could be measured right from the start. Everyone will be working from the same policy infrastructure "where there is policy as code", so that "when things go wrong, which they will", those problems will be noticed and they can be addressed. "People don't need to have their lives ruined", she said.

The process today is far different. Policies are conceived, then announced well before any actual work has gone into defining them. Then the slow policy development, legislative drafting, and parliamentary process phases are done before the policy even gets published. But at that point, no one who is not closely following the process even knows that the policy has been enacted. Even then, it requires interpretation by those affected, who are never told when they are right, only penalized when they get it wrong. So there are multiple interpretations and implementations without being sure that any one of them is right. It is "so far from good it is not even funny", she said.

But by using tools and frameworks for doing policy as code, policy drafts can be modeled and tested before making any announcements. New legislation can be drafted as human- and machine-readable simultaneously, so that reference implementations can be built, tested, and verified—all before the messy parliamentary process, which there is no getting around ("politicians will be politicians"). This leads to dramatic improvements in the quality of the legislation, its integrity, and the consistency with which it is applied.

Rules as code

Andrews said that her origin story around all of this came from when she was working for AUSTRAC, which is the Australian agency responsible for detecting and handling abuse of the financial system for things like money laundering. She recognized a number of problems soon after she started working there, which led her toward rules as code. She briefly described some of the problems she saw.

There is a lot of angst and a lot of stress because each entity needs to interpret the financial crimes laws itself. There is no one to tell anyone that their interpretation is correct, instead there are simply enforcement actions when they get it wrong. So there is a need for better rules, that can be modeled and implemented more easily. We need to recognize that all of these rules are going to make their way into digital systems at some point, so that should be planned from the outset. Legislation generally assumes that it will be shaping human behavior, but that is increasingly not the case. Machines do not care about legal, financial, or criminal penalties, which needs to be taken into account.

There are also key goals that get left behind because some secondary goals take precedence. For example, one of the key goals for AUSTRAC was to strengthen the financial system against abuse, but there is a secondary goal to detect and disrupt financial crime. However, all of the measures of success when she was there were about the number of people and/or institutions prosecuted for criminal activity. The incentives then run counter to the goal of strengthening the system since that would naturally decrease the number of violators, thus reduce "success".

Trust is a big problem for government agencies, as well. People are far more willing to place trust in social media or other organizations for a fairly simple reason, she said: those entities do not have the state's monopoly on violence. Those other organizations cannot arrest people or take their children from them, though "they can do some pretty awful things". Breakdown in trust leads to instability, so it is important for governments to earn and keep the trust of the people.

There are generally two types of rules that come up for policies and legislation: prescriptive and judgment-based rules. The prescriptive rules are the ones that need to be consistently applied, such as eligibility criteria, which makes them a good fit for rules as code. The judgment-based rules are things like a "good character" requirement for citizenship. Social and taxation policies often have many prescriptive rules that lend themselves well to being turned into code.

A common misconception is that the rules simply need to be put into some kind of structured content, such as XML, but that is not the same as something that is executable by the system. The idea is that the rules are described in a way that can be run on the computer, but also understood by humans, so that they can be verified. There are lots of companies selling tools, but many are still black boxes; you cannot see the rules and you "still cannot trace the decisions back to the actual legislation to see if it is actually lawful—gah!", she said in a bit of a mini-rant.

Case studies

She looked at a New Zealand policy for a tax reduction aimed at retirees if they were the owner and occupier of their home. The intent was to help those people stay in their homes longer—though no one is actually measuring whether that is occurring, Andrews said in an aside. She quoted from the legislation, which had a moderately complex formula for determining how much someone would receive, which was further complicated by the stilted legalese that was used to express it. But, at its core, it was just a set of rules that could easily be turned into code; in fact, it was simply a quadratic equation, "which is cool" because she is a "maths nerd" as well.

She and her colleagues used OpenFisca to write the policy as code. She loves OpenFisca, though it is difficult to get to know. It allows modeling changes over the whole population, which is something she has now done in three different countries. You can change various parts of the equation and see what the impacts are on everyone in the jurisdiction. It shows who will be better off and who will be worse off, depending on how the rules change. The changes to those parameters over time, such as income eligibility and percentage of reduction, can also be maintained in the model, so that trends over time can be tracked and analyzed.

She demonstrated a web site that had been created to help people determine their eligibility for the reduction, which calls out to OpenFisca to execute the rules. This allows people to easily decide whether to go through the somewhat laborious application process or not. Rather than the usual government trick of "if you finally jump through all of our hoops you might get something", this helped thousands of retirees see whether that "something" was even worth pursuing.

She showed a similar web site for calculating the "accommodation supplement" for people who are out of work. "It is almost impossible to figure out what your accommodation supplement is in New Zealand", because it is based on many different factors, such as location, dependents, former income, and so on. This project was done for a small citizens group, which allowed her to include an email template for those who were not receiving what the program said they should expect; they could use that template to challenge the amount calculated by the government. That was met with laughter and applause; "I am not planting grenades at all", Andrews said with a chuckle.

As she was wrapping up, she went through a bit of a rapid-fire summary of the ideas. A test-driven approach using code that is simultaneously readable (and usable) by humans and computers leads to better rules via a better process. Being able to provide a reference implementation as an API on a government web site is the dream, she said. The idea is to "start drawing a line in the sand" so that we start to get better rules that are "consumable, actionable, appealable, understandable by people more broadly".

There is a "terrifying" amount of automation using artificial intelligence going on in government right now. Andrews strongly believes that legislation is needed as a pre-condition to the lawful use of AI in government. It is really important to be able to ensure that the system provides the correct results in a way that can be questioned and challenged. It is also important that people get consistent results over time even if the system is continuing to learn. She has recently co-authored a paper about a trust framework for the use of AI in government.

Andrews answered several audience questions, largely about how to help make more of "rules as code" happen in governments. Those interested should seek out the video once it becomes available. [Update: It is out.] One important takeaway is that there is a need for "great techy people with innovative minds and open-source mentalities to work in government, come play!", she said with a chuckle.

For the most part, once the various "factions" start to use the tools and see what they can do, they rapidly become interested in using it more. In her experience, tax agencies pretty much everywhere already have a miniature version of her collaborative model where all of the groups co-design the policies; perhaps that is because of the influence businesses tend to have in government, but it helps provide evidence that the process works.

Some of the ideas Andrews presented may seem rather utopian, but she obviously sees them as both desirable and achievable. The picture she painted of dysfunctional government run amok is probably not surprising, but she noted that the people in the trenches are often those who are most interested in seeing things change. It is the people in the upper administrative levels who are most resistant, which perhaps is also not surprising. One hopes that many positive changes will be coming from the adoption of a more test-driven approach to our rules and regulations. Time will tell.

[ Thanks to LWN subscribers for supporting my travel to Melbourne for Everything Open. ]

Comments (53 posted)

An EEVDF CPU scheduler for Linux

By Jonathan Corbet
March 9, 2023

The kernel's completely fair scheduler (CFS) has the job of managing the allocation of CPU time for most of the processes running on most Linux systems. CFS was merged for the 2.6.23 release in 2007 and has, with numerous ongoing tweaks, handled the job reasonably well ever since. CFS is not perfect, though, and there are some situations it does not handle as well as it should. The EEVDF scheduler, posted by Peter Zijlstra, offers the possibility of improving on CFS while reducing its dependence on often-fragile heuristics.

CFS and scheduling constraints

One of the key design goals of CFS was, as might be understood from its name, fairness — ensuring that every process in the system gets its fair share of CPU time. This goal is achieved by tracking how much time each process has received and running those that have gotten less CPU time than the others, with each process's run time scaled by its "nice" priority. CFS is, in other words, a weighted fair queuing scheduler at its core.

Fairness, it turns out, is enough to solve many CPU-scheduling problems. There are, however, many constraints beyond the fair allocation of CPU time that are placed on the scheduler. It should, for example, maximize the benefit of the system's memory caches, which requires minimizing the movement of processes between CPUs. At the same time, though, it should keep all CPUs busy if there is work for them to do. Power management is a complication as well; sometimes the optimal decisions for system throughput must take a back seat to preserving battery life. Hybrid systems (where not all CPUs are the same) add more complications. And so on.

One place where there is a desire for improvement is in the handling of latency requirements. Some processes may not need a lot of CPU time but, when they do need that time, they need it quickly. Others might need more CPU time but can wait for it if need be. CFS does not give processes a way to express their latency requirements; nice values (priorities) can be used to give a process more CPU time, but that is not the same thing. The realtime scheduling classes can be used for latency-sensitive work, but running in a realtime class is a privileged operation, and realtime processes can badly affect the operation of the rest of the system.

What is lacking is a way to ensure that some processes can get access to a CPU quickly without necessarily giving those processes the ability to obtain more than their fair share of CPU time. The latency nice patches have been circulating for some time as an attempt to solve this problem; they allow CFS processes with tight latency requirements to jump the queue for the CPU when they want to run. These patches appear to work, but Zijlstra thinks that there might be a better approach to the problem.

Introducing EEVDF

The "Earliest Eligible Virtual Deadline First" (EEVDF) scheduling algorithm is not new; it was described in this 1995 paper by Ion Stoica and Hussein Abdel-Wahab. Its name suggests something similar to the Earliest Deadline First algorithm used by the kernel's deadline scheduler but, unlike that scheduler, EEVDF is not a realtime scheduler, so it works in different ways. Understanding EEVDF requires getting a handle on a few (relatively) simple concepts.

Like CFS, EEVDF tries to divide the available CPU time fairly among the processes that are contending for it. If, for example, there are five processes trying to run on a single CPU, each of those processes should get 20% of the available time. A given process's nice value can be used to adjust the calculation of what its fair time is; a process with a lower nice value (and thus a higher priority) is entitled to more CPU time at the expense of those with higher nice values. To this point, there is nothing new here.

Imagine a time period of one second; during that time, in our five-process scenario, each process should have gotten 200ms of CPU time. For a number of reasons, things never turn out exactly that way; some processes will have gotten too much time, while others will have been shortchanged. For each process, EEVDF calculates the difference between the time that process should have gotten and how much it actually got; that difference is called "lag". A process with a positive lag value has not received its fair share and should be scheduled sooner than one with a negative lag value.

In fact, a process is deemed to be "eligible" if — and only if — its calculated lag is greater than or equal to zero; any process with a negative lag will not be eligible to run. For any ineligible process, there will be a time in the future where the time it is entitled to catches up to the time it has actually gotten and it will become eligible again; that time is deemed the "eligible time".

The calculation of lag is, thus, a key part of the EEVDF scheduler, and much of the patch set is dedicated to finding this value correctly. Even in the absence of the full EEVDF algorithm, a process's lag can be used to place it fairly in the run queue; processes with higher lag should be run first in an attempt to even out lag values across the system.

The other factor that comes into play is the "virtual deadline", which is the earliest time by which a process should have received its due CPU time. This deadline is calculated by adding a process's allocated time slice to its eligible time. A process with a 10ms time slice, and whose eligible time is 20ms in the future, will have a virtual deadline that is 30ms in the future.

The core of EEVDF, as can be seen in its name, is that it will run the process with the earliest virtual deadline first. The scheduling choice is thus driven by a combination of fairness (the lag value that is used to calculate the eligible time) and the amount of time that each process currently has due to it.

Addressing the latency problem

With this framework in place, the implementation of quicker access for latency-sensitive processes happens naturally. When the scheduler is calculating the time slice for each process, it factors in that process's assigned latency-nice value; a process with a lower latency-nice setting (and, thus, tighter latency requirements) will get a shorter time slice. Processes that are relatively indifferent to latency will receive longer slices. Note that the amount of CPU time given to any two processes (with the same nice value) will be the same, but the low-latency process will get it in a larger number of shorter slices.

Remember that the virtual deadline is calculated by adding the time slice to the eligible time. That will cause processes with shorter time slices to have closer virtual deadlines and, as a result, to be executed first. Latency-sensitive processes, which normally don't need large amounts of CPU time, will be able to respond quickly to events, while processes without latency requirements will be given longer time slices, which can help to improve throughput. No tricky scheduler heuristics are needed to get this result.

There is a big distance, though, between an academic paper and an implementation that can perform well in the Linux kernel. Zijlstra has only begun to run benchmarks on his EEVDF scheduler; his initial conclusion is that "there's a bunch of wins and losses, but nothing that indicates a total fail". Some of the results, he said, "seem to indicate EEVDF schedules a lot more consistently than CFS and has a bunch of latency wins".

While this is clearly a reasonable starting point, Zijlstra acknowledges that there is still quite a bit of work to be done. But, he said, "if we can pull this off we can delete a whole [bunch] of icky heuristics code", replacing it with a better-defined policy. This is not a small change, he added: "It completely reworks the base scheduler, placement, preemption, picking -- everything. The only thing they have in common is that they're both a virtual time based scheduler."

Needless to say, such a fundamental change is unlikely to be rushed into the kernel. Helpfully, the current patches implement EEVDF as an option alongside CFS, which will enable wider testing without actually replacing the current scheduler. The CPU scheduler has to do the right thing for almost any conceivable workload on the wide range of systems supported by the kernel; that leaves a lot of room for unwelcome regressions resulting from even small changes — which this is not. So a lot of that testing will have to happen before consideration might be given to replacing CFS with EEVDF; there is no deadline, virtual or otherwise, for when that might happen.

Comments (23 posted)

Heuristics for software-interrupt processing

By Jonathan Corbet
March 13, 2023

The kernel's software-interrupt ("softirq") mechanism was added prior to the 1.0 kernel release, but it implements a design seen in systems that were already old when Linux was born. For much of that time, softirqs have been an impediment to the kernel community's scalability and response-time goals, but they have proved resistant to removal. A recent discussion on a proposed new heuristic to mitigate a softirq-related performance problem may have reinvigorated interest in doing something about this subsystem as a whole rather than just tweaking the parameters of how it operates.

Hardware interrupts are generated when some component of the system needs the CPU's attention to, for example, deal with a completed I/O operation. The processing of hardware interrupts is one of the highest-priority tasks in the kernel; an interrupt will preempt almost anything else that might be running, so the amount of work done in interrupt handlers must be kept to a minimum to avoid adversely affecting the rest of the system. The softirq mechanism was designed to allow hardware-interrupt handlers to set aside work to be done urgently — but not quite as urgently as hardware-interrupt processing.

The subsystems that use software interrupts include networking, timers, the block subsystem, read-copy-update (RCU), and tasklets. When one of these subsystems has work to delegate to a softirq handler, it will "raise" a softirq by setting a bit in a special mask. When the kernel is in a position to handle software interrupts — usually either at the end of hardware-interrupt processing or on return from a system call — it will make a pass over the raised softirqs and call the appropriate handler for each.

In practice, a softirq raised out of a hardware-interrupt handler will often be run immediately after that hardware handler finishes, but that is not necessarily the case. Softirqs can also be raised out of any (kernel) context, not just while responding to hardware interrupts; the RCU softirq, for example, is not tied to any hardware interrupt at all.

The problem here is that there may be a lot of work for the softirq handlers to do. They are invoked for, among other things, packets received from the network and RCU callbacks and, by the time the handlers run, there may be thousands of each waiting. So softirq processing can go on for a long time, to the detriment of the rest of the work the system is meant to be doing. This problem gets worse if, as can easily happen, more work shows up while softirq handling is happening.

Managing softirq handling

To avoid overwhelming the system with softirq processing, a number of heuristic mechanisms have been added to the kernel over time. These include:

The function that normally processes software interrupts (__do_softirq()) will pass over all of the raised softirqs and process them. Once that is done, it checks if more softirqs are pending; should that be the case, it will go back to the beginning — but only for a maximum of ten times. If that count is exceeded, the kernel stops processing softirqs and, instead, wakes the per-CPU ksoftirqd kernel thread to continue the work.
Similarly, if softirq processing continues for more than (approximately) 2ms, the remaining work will be punted to ksoftirqd.
Whenever the ksoftirqd thread is running on a given CPU, the kernel will not even try to process software interrupts there; it will just leave them for the thread to handle.
When the kernel is processing software interrupts, it will occasionally check the current process (the one that was preempted to handle softirqs) for the TIF_NEED_RESCHED flag, which indicates that a higher-priority process is ready to run. In that case, it will stop processing softirqs to defer to that process.

Some of the subsystems using software interrupts also implement limits in their own handlers, independent of anything that the central softirq code manages (or even knows about).

Toward the end of December, Jakub Kicinski posted a patch set (under the title "uncontroversial change") addressing a problem that has been encountered with the last heuristic listed above. The deferral of softirq processing when the kernel wants to reschedule was meant to allow low-latency processing of events. If some audio data comes in, for example, a recorder application will want be able to run and to grab it quickly. But if the process that runs after reschedule is not quick — if it holds onto the CPU for a long time — it will block all softirq processing for that long time. That can cause undesirable behavior like networking stalls, TCP retransmissions, and more.

To fix the problem, Kicinski has proposed the addition of another heuristic. Once the rescheduling deferral happens, the kernel will only wait 2ms before it starts handling softirqs again, regardless of whether ksoftirqd is running; this will keep a long-running process from blocking softirq processing for too long. A similar timeout applies to deferrals caused by an overload of softirqs — the first two items in the list above. In that case, the kernel will restart handling softirqs after 100ms. These changes, he said, result in a tenfold reduction in networking stalls and a 50% drop in TCP retransmissions.

Addressing the real problem

Thomas Gleixner seemed willing to accept the new timeouts, though he added that he is "not a big fan of heuristical duct tape". He pointed out some problems with the timekeeping; it is using the jiffies time variable, which has a resolution of 1-10ms, for small-millisecond values. That can lead to fairly widely varying results — a problem that, he later realized, exists in current kernels as well. But it didn't take much longer before he complained about the whole approach which, he said, just makes the overall softirq problem worse.

Softirqs, he said, "are just the proliferation of an at least 50 years old OS design paradigm". They exist to allow certain code to circumvent whatever resource management (CPU scheduling, for example) is in place and, as a result, make it impossible to fully control the operation of the system. He grumbled that the usual approach when problems inevitably turn up is to add more heuristics and knobs. "Can we please stop twiddling a parameter here and there and go and fix this whole problem space properly?", he asked. He also acknowledged that he did not have a complete solution in mind, but promised to think about it in the near future.

Kicinski said that the networking developers are trying to move processing out of software interrupts, but that it is a long, slow process. Frederic Weisbecker, instead, pointed out the real problem with softirqs: only one of them can run at a time, and there is kernel code that depends on that level of exclusion. Nobody really knows, after all these years, which softirq-handling code can safely run concurrently with other handlers, and which cannot. So disabling of software-interrupt processing has taken on a role similar to that of the unlamented big kernel lock and, like the big kernel lock, softirqs are hard to safely get rid of.

The solution, he suggested, is a pushdown effort similar to what was done for the big kernel lock. All softirq processing could be pushed into kernel threads, which would then be scheduled in the normal way, but they would all start by running with the equivalent of the softirq-disable big lock held. As the handlers for specific subsystems were proven safe, they could release that lock until, someday, it would wither away entirely. The task, he said, is not for the faint of heart:

Of course this is going to be a tremendous amount of work but it has the advantage of being iterative and it will pay in the long run. Also I'm confident that the hottest places will be handled quickly. And most of them are likely to be in core networking code.

All that is needed is somebody to actually do all of that work.

Matthew Garrett once said that "'heuristic' is an ancient African word meaning 'maybe bonghits will make this problem more tractable'". Certainly software interrupts have been the sort of intractable problem that can drive developers to such a remedy. Perhaps, someday, this ancient subsystem will be cleaned out, urgent processing will be done in a controlled manner, and the heuristics will no longer be needed. For now, though, they are still necessary, and Kicinski's patch may be the sort of bandage that makes them work a little better for a little longer while the real problem is being solved. After all, even though the development community will surely deal with softirqs for real this time, this work is likely to take a while to come to fruition.

Comments (48 posted)

Interview: the FreeCAD Project Association

March 10, 2023

This article was contributed by Alexandre Prokoudine

The sustainability of free software continues to be mostly uncharted waters. No team is the same as any other, so copying, say, the Blender Foundation’s approach to governance will, most likely, not work for other projects. But there is value in understanding how various non-commercial organizations operate in order to make informed decisions for the governance of new ones. In late 2021, the FreeCAD team launched the FreeCAD Project Association (FPA) to handle the various assets that belong to this free 3D CAD project. In this interview, Yorik van Havre, a longtime FreeCAD developer — and current president of the Association — guides us through the process of starting and managing the FPA.

The genesis of the FPA

Alex: Let’s start with the basics: what is the FreeCAD Project Association, where is its headquarters, and who are the Association's members?

Yorik: The FPA is a non-profit association aimed at supporting the FreeCAD project and its developers. It is based in Brussels, Belgium, where I live (in fact, its address is my home address). It is composed of about 15 veteran members and developers of the FreeCAD community, of which four are administrators. I am currently one of these four administrators as well as the FPA president We are elected by the FPA members for a period of two years.

FPA members are currently the founding members plus a few more that came along afterwards. We have intentionally not decided yet how new people could become members, because we still have no clear idea of what would be efficient.

Alex: How did the team arrive at the decision to create the Association?

Yorik: There are two main reasons that led us to creating the FPA. One was the growing desire of the community to contribute financially to FreeCAD and to "do something with money". It might sound easy, once people are ready to donate money, but it's not. Unless you live in a fiscal haven, anyone receiving money, be it donation or work revenue, is subject to taxes. So we couldn't just receive money. It would mean the person receiving it would have a lot of taxes to pay, even if that person would only redistribute it afterwards. So we needed a structure that is able to hold that money and is not tied to any of us. Also, spending money properly and responsibly involves some work.

There is an increasing number of platforms and organizations, such as OpenCollective, Liberapay, or the Free Software Conservancy, that can do that part for you. So this reason alone would probably not be good enough to create a non-profit for an open-source project. Of course, there are downsides to these platforms: some will require transferring the code ownership to them (I personally would advise against that), most will take a percentage of earnings, and using the money is often a bit more restrictive than using your own bank account. Nevertheless, these platforms are certainly a great way to start.

The second reason is that, in the past years, we have seen a growing number of misuses or misappropriations of FreeCAD. We have concerns that it might get "stolen" from us by any company willing to put up some money to obtain things like trademarks, web domains, etc. Of course, community-developed open-source code like FreeCAD itself cannot be stolen, but we wouldn't like to see all the effort we've put into making it known be taken from us. In short, we wanted a way to be able to say: "this is the real, official FreeCAD".

Alex: Team members in the FreeCAD project come from all over the world. Why Belgium?

Yorik: Since I was the one willing to put some effort in creating that non-profit, and I live in Belgium, we naturally looked at what Belgium offered. The Belgian law offers several types of non-profit structures: foundations, classical (national) associations, and international associations. When we saw that last one, we thought: that's clearly for us!

It's worth mentioning that almost every EU country offers similar structures, with similar benefits and hassles. The Belgian international association type does not even require any of the members to be Belgian or actually live in Belgium, so this looked perfect to us. A foundation inherently means "people wanting to put money into a project", while association means "people wanting to associate to pursue a project", and the latter seemed to suit better what we wanted to do. There were a lot of things we wanted to do, for which the association could be useful: organizing courses, working with companies and institutions, etc.

The FPA and the FreeCAD community

Alex: Was there any reluctance to start the Association? Or fears among developers?

Yorik: Not really from the members themselves, we were all overwhelmingly on the same page. But the FreeCAD community is nowadays very large. A process like creating an association would never have succeeded if we had done that by opening a wide discussion with the entire community. Each and every person would have different ideas over the form that this association should take and, like many similar cases, it would never happen. So, a few developers and I explained to the community our intent to create such a structure, but we basically did it between us and presented the results to the community afterwards.

Alex: What was their response?

Yorik: Some in the community of course felt hurt they weren't included. But we took great care to design the FPA to not be a "ruling body" over FreeCAD. FreeCAD is a community-developed project, where anybody has their say, and all of us thought the FPA should actually help to protect that and not overrule anything. So in the FPA statutes, which is a kind of a constitution of a non-profit, we introduced clear barriers that would prevent the FPA from ruling over anything in FreeCAD. We wanted it to be a practical, helper structure to the project that all the community could use and benefit from, and not a ruling body. I think the community understood the idea well, and I believe everybody is basically happy with it now.

Alex: Who is currently managing the FPA?

Yorik: Myself and three other administrators do most of the work, and the rest of the members vote. We are still setting things up, though. We are only beginning to actually spend money. I expect things to ramp up this year, and there will probably be work for more people.

Alex: What kind of work are you thinking of?

Yorik: Organize projects where money could be spent. That can include organizing events, FreeCAD presentations, or things like courses. Most of all, we would like to spend money on coding. But you cannot just throw money and have code magically being written. You need to go after interested people, write proper work proposals, verify the results, etc. We are busy writing guidelines for all this, but basically the FPA is open to just any proposal coming from the community.

Alex: You said earlier that not everybody was happy with the FPA, at least initially. What’s the relationship dynamics between the FPA and the FreeCAD community?

Yorik: The FPA is a side body, a helper structure, designed to help and fund the project and its developers, and also hold things in the name of the community, like trademarks, domain names, etc. At the moment that is the only thing it does.

We would like to make the FPA an official representative of the FreeCAD community, so that the Association would be able to speak on behalf of the entire community, for example, to universities, companies, etc. But we don't want to impose that, I think it should come naturally, it's the community who need to "elect" the FPA as its representative. We take great care to be fully transparent and open, and encourage the community to make use of the FPA's resources, so it's a mutual building of trust. Like all things in FreeCAD, it goes slowly but surely.

Running the FPA

Alex: Let’s roll back a bit. How much time did it take you to register the FPA and do the related paperwork?

Yorik: About a year. Setting up an international non-profit association requires a notarial act and registering to several government instances. We basically drafted the association statutes ourselves and hired a notary to do the rest. As a result, thanks to being in Belgium, we have a nice, impressive document signed by the King, that gave birth to the FPA.

Alex: What kind of regular paperwork do you do for the FPA and how much time does it take?

Yorik: I do a weekly recap and check all accounts for issues, and then a monthly report that goes public. Another administrator keeps a more detailed track of accounting, and we are now looking for an accountant to help us with the official yearly report. We are a very small association though. Requirements are not complex and we can do most of the work ourselves. I would say I spend myself between half a day and a day per week working on the FPA.

Alex: What do you think are FreeCAD’s most important challenges, technical and otherwise? How do you think FPA can help dealing with those?

Yorik: My own opinion (others in the FPA might have different opinions) is that we are more and more at risk to lose or weaken the community aspect of the development. FreeCAD is growing larger, interests companies more and more, and there is a push to "professionalize" the development. I'm against what that idea imposes, because we as a community are perfectly capable of producing a professional tool, in a professional way.

A good, open, transparent and fair open-source workflow is often (always?) way more professional than what happens behind the closed doors of companies. For example, from one standpoint, taking a lot of time to roll out certain features is deemed unprofessional. But from another standpoint, it is actually very professional and responsible.

We don't merge changes just because there is pressure to do so. Things take time because we want FreeCAD and the team around it to be stable. So we take great care to do things right, without disruptions. We make sure that other developers will be able to integrate changes without introducing bugs down the line. The community is a very strict and strong quality assessor.

With the FPA, we are trying to help people from outside the community to understand all this. Turn all this into a conscious, documented, written, transparent, open workflow. I think that's the big path where we can professionalize things. Make outside developers aware of how things work in the FreeCAD project. And we'll also work to keep the community-based ecosystem flourishing.

Another issue I find worrisome is the lack of diversity in the FreeCAD community. It is still very much a white male club. We already thought a lot about ways to change that, and obviously there is no magic recipe. I believe the FPA also has a role to play there by setting guidelines and mechanisms so that everybody would feel welcome and protected.

Alex: What kind of decisions do you deal with in the FPA and how do you make them? Do you do meetings, minutes, and suchlike?

Yorik: That's another advantage of working with an open-source project. We know how to keep track of changes, don't we? Basically we work the same way as we are used to: we open issues, we discuss them, we mark them as solved. Voting happens in a private room we have, as needed. We do a weekly 30-minute video chat with administrators only, and a video chat all together once in a while, but mostly for fun. Other than that, we have no formal meetings.

When we built the FPA, we decided that members should be free to vote or not, but also that a minimum of 50% of the members should vote for a vote to be valid. This quickly led to the FPA being blocked, as not enough members were voting. We ended up introducing an active/inactive system where members who didn't vote for a long time become inactive until they ask to be made active again, which solved all the problems. Things work really well now and voting is easy.

Alex: How do you decide what projects get grants and for how much?

Yorik: We take the slow lane here. None of us thought we would get so much money right away. We are still trying to scale up. So we decided on a project to give USD $1,000 grants. Just to be careful and not overspend. So far it has been easy, everybody who asked for it got the grant. So we'll actually need to scale that up. Surprisingly, it's harder to spend the money than to earn it.

Sponsorship

Alex: According to the latest annual report, FPA has over 60K euro in total. Where does the money come from mostly? Individuals? Companies?

Yorik: I'd say about 90% is coming from donations by individuals and 10% from donations by companies.

Alex: Can you see regular corporate sponsoring happening, similar to what Blender Foundation has?

Yorik: Yes, absolutely. But we also want to take great care in not becoming dependent on big companies' money and be responsible with it. So we are not really begging for it just yet, we would rather try to come up with proposals and counterparts first to make the team and the companies more like equal partners.

I want to stress this: FreeCAD is made by people, for people. It would be best if people continued considering FreeCAD is 100% theirs, not something driven by commercial players. Quite the opposite: I'd rather see the FreeCAD community being able to tell the commercial world how they would like things to be done.

Alex: What would you say are the highlights of the first year of FPA?

Yorik: Being still alive might be a good result already, but having attracted so much more money than we thought, without much guarantee yet that we would spend that well, makes me pretty happy that there is so much trust in it from the community. We also learned a lot about working together in a different, more "serious" way — being more responsible, accountable, transparent and fair at deciding for others. I think we have passed the test, and that's really thrilling.

Managing the FPA takes a lot of time, mine and others', more than we expected. And deciding what to do with the money is actually much more complicated than it looks. Most people work on FreeCAD for fun, they are not so much interested in making money there. I think most actually enjoy working on a project where no money or other commercial considerations are involved. We don't want to change anything there, we think it's precious.

But it is also a great opportunity, you know, how to do this in a good, correct, responsible way, that does not harm the people or the project.

Alex: The latest news is that Brad Colette, a long-time contributor to FreeCAD, started a public benefit company, Ondsel, with Open Core Ventures to create cloud services based on FreeCAD. Brad is also on the FPA board. What do you expect the collaboration between Ondsel, FPA, and FreeCAD to be?

Yorik: I don't have a clear picture of how the collaboration will be yet. But I trust Brad a lot. He is a responsible person, he cares a lot about the FreeCAD project and its community, so I'm sure the collaboration will be fruitful.

Our thanks to Yorik van Havre for taking the time to respond in detail about the operation of the FreeCAD Project Association.

Comments (7 posted)

Zephyr: a modular OS for resource-constrained devices

March 14, 2023

This article was contributed by Koen Vervloesem

Writing applications for devices with a lot of resource constraints, such as a small amount of RAM or no memory-management unit (MMU), poses some challenges. Running a Linux distribution often isn't an option on these devices, but there are operating systems that try to bridge the gap between running a Linux distribution and using bare-metal development. One of these is Zephyr, a real-time operating system (RTOS) launched by the Linux Foundation in 2016. LWN looked in on Zephyr at its four-year anniversary as well. Seven years after its announcement, Zephyr has made lots of progress and now has an active ecosystem surrounding it.

Zephyr is an RTOS for connected, resource-constrained devices, such as Internet of Things (IoT) sensors, Bluetooth trackers, heart rate monitors, smartwatches, and embedded controllers. A typical device running Zephyr has a microcontroller with a clock frequency below 100MHz, no MMU, 32KB to 256KB of static RAM, and 512KB or less of on-chip flash memory.

Most of Zephyr's code is published under the Apache 2.0 license, except for drivers from Nordic Semiconductor and NXP Semiconductors, which use the 3-Clause BSD license. Some of the build tools are covered by the GPLv2. In its FAQ, the project explains the choice of the Apache 2.0 license was because its permissive nature enables the project to expand participation and membership to organizations that have not typically participated in open-source development in the past.

Modular operating system

Zephyr's targets are often single-purpose devices, running just one application. This application is compiled together with the operating system, any external libraries, and perhaps even a shell, all of which is statically linked into a single executable file. This firmware binary can then be flashed to the device and will run on the microcontroller in a single address space. Zephyr's build system is based on CMake. Each Zephyr application has a CMakeLists.txt file that tells the build system where to find the required code files and how to link the application.

Developers can use CMake directly to build their application, but Zephyr's default build tool is west. A west build command run in a Zephyr application directory, calls CMake behind the scenes. The Zephyr developers describe west as a "meta-tool". It's not only used to build applications, but also to flash them to devices and even to manage repositories.

The Zephyr project consists of a lot of subsystems that can be enabled or disabled individually for the firmware image. Thanks to this modular nature, the RTOS is able run on devices with a wide range of capabilities, from the smallest microcontrollers with 8KB of RAM (only using the bare minimum of subsystems) to development boards that are powerful enough to run Linux.

To choose the subsystems and to set other configuration options, Zephyr uses Kconfig, the Linux kernel's configuration system. There are options to enable and disable subsystems (such as Bluetooth, IPv6, and I²C) or to configure these subsystems in detail. Architecture-specific options, drivers, and libraries are all configured in the same way.

A build configuration can also be split into several files. For this purpose, Zephyr has the concept of an overlay configuration file. This file has extra configuration options that are added to the default configuration file. Overlay configuration files are useful to build a debug version of the firmware, for example, or a minimal version and a full version.

Hardware abstraction layer

Like the Linux kernel, Zephyr uses devicetree, which is a hierarchical structure that describes the hardware. For each supported development board, there's a devicetree that describes what hardware is present, such as a universal asynchronous receiver-transmitter (UART), a Serial Peripheral Interface (SPI) bus, general-purpose input/output (GPIO) pins, SRAM, flash, and so on. When building an application, the developer specifies the target board, and the build system then chooses the appropriate devicetree for that board and generates a C header file.

The device drivers and the application can include this generated header to access the hardware. Zephyr has a generic model for device drivers: each type of driver (for example UART, I²C, SPI) has a generic API. So application developers usually don't have to look up how to use that specific driver for each device. Several hardware manufacturers, such as Nordic Semiconductor, STMicroelectronics and NXP Semiconductors, maintain their own hardware abstraction layer (HAL) for Zephyr. This should give application developers some confidence in the manufacturer's support for Zephyr.

An existing devicetree can be customized with a devicetree overlay. For example, when building an application for a development board extended with some external sensors, the developer would create an overlay describing the additional sensors. Zephyr's build system automatically picks up the overlay if the file name consists of the name of the board and the file extension .overlay. It is then merged with the devicetree of the standard board without the sensors. As a result, the application code is able to access the sensors.

Communication

Zephyr supports a lot of protocols and hardware standards, from low-level hardware standards and protocols to higher-level network and application protocols. All of them are subsystems that can be enabled individually if the application needs them.

For communication between the microcontroller and sensors or other components on the board, Zephyr supports I²C, SPI, I²S audio, and serial communication using a UART. Of course, the latter can also be used for communication with the outside world if the board exposes UART pins for logging or a shell. USB and Controller Area Network (CAN), which is used in automotive environments and industrial automation, are two other supported mechanisms for external communication.

Since Zephyr is targeted at IoT devices, support for networking and wireless-communication protocols is essential. In the lower layers of the Open Systems Interconnection (OSI) model, these are Ethernet, Point-to-Point Protocol (PPP), LoRa, and IEEE 802.15.4. A bit higher in the stack, there's support for Thread, a low-power IPv6-based wireless mesh-networking technology for home-automation applications, which is based on the OpenThread implementation. Zephyr's Bluetooth protocol stack, which provides a complete Bluetooth Low Energy (BLE) stack as well as portions of Classic Bluetooth, is also a popular choice. For example, Nordic Semiconductor, a popular manufacturer of BLE chips, has based its nRF Connect SDK on Zephyr.

Zephyr's network stack has a special focus on application protocols that are used in IoT environments. This includes Constrained Application Protocol (CoAP), which offers a lightweight version of a representational state transfer (REST) API. On top of CoAP, Lightweight M2M (LwM2M) can be used for device management, reporting, and control. Furthermore, Zephyr has a Message Queuing Telemetry Transport (MQTT) client library with support for MQTT 3.1.0 and 3.1.1. They can be used directly over TCP or TLS, or on top of a WebSocket connection.

Filesystems and device management

Zephyr's Virtual Filesystem Switch (VFS) offers a generic API for filesystems. Currently FAT and littlefs (a fail-safe filesystem designed for microcontrollers) are supported. Storage can be on internal or external flash memory, on an SD card, or in RAM. Applications can mount multiple filesystems at different mount points.

For remote management of Zephyr-based devices, the operating system has a subsystem based on the open-source project MCUmgr. The mcumgr command-line tool allows users to manage their devices via BLE, a UART serial connection, or UDP over IP. Users can view, delete, and upload system images, download and upload arbitrary files from and to the filesystem, reboot the device, and open a shell.

The Device Firmware Upgrade (DFU) subsystem integrates with MCUboot, which is a hardware- and OS-independent bootloader for 32-bit microcontrollers. On a device that has this bootloader, Zephyr applications can be flashed via MCUboot's serial recovery mode. MCUboot also is able to verify firmware images. A public/private key pair can be created and then used to sign firmware images with the private key; the bootloader only accepts an update when it can verify the signature with the corresponding public key.

Quality, security, and Long-Term Support

The Zephyr project follows a release process loosely based on the Linux kernel, with a roughly four-month release cycle. There's a lot of focus on quality and security in this process. A security working group handles vulnerabilities that have been found, but also works on a secure architecture for the operating system. The security measures taken by the developers are documented. In 2019, Zephyr was one of the three open-source projects to receive the "gold" badge for Best Practices from the Core Infrastructure Initiative. Currently, 15 projects have achieved this status and Zephyr is still among them. With this badge, the project has demonstrated that it is following the security best practices in its development.

Long-Term Support (LTS) releases are published with a roughly two-year release cycle. They are maintained independently from the main tree for at least two and a half years. An LTS release does not get new features or substantial improvements, but it does get fixes for serious or security-related bugs. This gives companies a stable base to build on for their products.

The first LTS release was Zephyr 1.14 in April 2019, with 160 supported board configurations for eight architectures. This was followed in October 2021 by the second LTS release, Zephyr 2.7, supporting 400 board configurations for 12 architectures. The release milestone dates in the project's wiki currently target Zephyr 3.6 for the third LTS release, which is expected in February 2024.

Sample applications

While it is difficult to come up to speed on Zephyr and the development documentation can be overwhelming, the project offers a lot of sample applications, from the simple Blinky application that blinks an LED to complex networking applications and Bluetooth examples. Some of the sample applications are for a specific board or a specific sensor, while another example shows how to use an external library. Each of them comes with documentation explaining what the code does, how to build the firmware, and what to expect from it.

Many of the sample applications make extensive use of devicetree overlays and overlay configuration files to make them as generic as possible. For example, the Socket Echo Server supports OpenThread, IEEE 802.15.4, Bluetooth Internet Protocol Support Profile (IPSP), PPP, TLS, or IP tunneling, all using their own overlay configuration file that adds to the generic configuration.

When starting development on a new application, it's easy to start from one of the sample applications and then build on that. All sample code can be found in a Zephyr repository on GitHub. There's also a generic example application that can be used as a template for new Zephyr applications.

Board support and Zephyr in the wild

Developers are continuously adding support for new development boards. Currently, Zephyr supports more than 450 boards from various architectures. Most of them are 32-bit ARM boards, but there's also support for the Arm64 architecture, Xtensa (including the popular ESP32 microcontrollers), and RISC-V. Support for specific features varies from board to board or architecture to architecture. For example, the ESP32 microcontrollers support for Wi-Fi and Bluetooth requires Espressif's HAL that uses binary blobs.

Zephyr can be found in a couple of open-source projects, for example in the ZMK firmware for keyboards. There's also Zephyr firmware for the PineTime smartwatch, and the ZSWatch, an open-hardware and open-firmware smartwatch built on Zephyr. Commercial hardware manufacturers, on the other hand, are usually reluctant to give too much information about how they build their firmware, but Zephyr's news releases regularly showcase products based on the operating system.

It's not always about IoT products, though. Embedded controllers, responsible for battery charging, USB power delivery, keyboard handling, and other low-level tasks, are also a good fit for Zephyr. In July 2021 Google decided to move its Chromebooks to the Zephyr Embedded Controller. This simplifies the customization steps that Chromebook manufacturers need for their devices. Intel also created Zephyr-based Embedded Controller firmware, which OEMs can use as a base for the embedded controller in the PCs they manufacture. They still need to combine this with a board-support package (BSP) and HAL for their specific hardware.

Conclusion

Zephyr is a flexible and modular real-time operating system for devices that are not powerful enough for Linux. A lot of functionality is included by default and the license is friendly for commercial use. Moreover, in seven years the project has managed to build a thriving ecosystem in which numerous hardware manufacturers also play a central role by maintaining their own HAL and contributing code. Be it for an embedded controller or a wireless IoT device, there are many good reasons to consider Zephyr.

Comments (4 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>