I'll start with some belated thoughts on the Open Source AI definition that was released last week. At IndiaFOSS in September this year, I spoke about the OSAID and urged people to give feedback on it. I was concerned that people weren't following a very important development closely enough. A version of the talk was published on the OSI blog that sparked some discussion on the OSI forum, but this wasn't a conversation that I wanted to casually dip into. I am sharing here, some more considered reflections.
Some background- Tattle's work is open source and has always had a machine learning (AI) component to it. While the open source AI rhetoric reached a crescendo with the release of large language models, it has been background chatter as far back as I can remember. ML developers thinking about licenses have had to do the math of openness based on the components they've used. Questions we've asked ourselves in context of Feluda- is it open source if it relies on Resnet? BERT? What if we use Google Cloud Vision API as one layer of data processing? These were also questions that we had to answer when submitting Feluda to the DPGA registry. And then again, for Uli.
I am not privy to the events that led OSI to start the consultation process. Some blogs implied that it was Meta's misuse of calling Llama Open Source. But from my perspective, there has been plenty of small scale (mis)use of the open source language in AI predating the genAI boom. It just didn't hit the media scrutiny and public debate scale. The curse of the success of open source is that it has become a common noun to refer to a whole range of things that it wasn't originally intended for. 'Open source → good' has been used as discursive technique to sidestep constitutional oversight of public infrastructure in India. Perhaps that makes me more sensitive to the misuse of the term open source. Even if I didn't want to call out other projects for what I saw was wrong use, I surely didn't want Tattle to lower the signal-to-noise ratio, by calling something open source when it wasn't clear what it meant. I have welcomed clarity on what open source AI means.
To be clear- that the definition requires developers to not open data makes me uncomfortable. I don't trust research that doesn't publish its data. It is also harder to understand research without its data. Ten minutes with a CSV dump is worth more than two hours on a dataset paper. The aspirational position that the Software Freedom Conservancy put out on the use of GenAI in programming is inspiring. There is a world in which the pressure to open source everything along the AI value chain will result in more responsible data collection, and maybe even alternative models of AI development. But pragmatically, we can't reverse the last decade of AI development trajectory. The data guzzling drive is some time from abating. And we can't open data for all domains- not on individuals' reproductive health, not on individual spending patterns.
The choices for an OSAID definition weren't great. Be maximalist about openness on all fronts and leave out whole range of AI applications. Compromise on openness of data and reduce the degree of four freedoms. But a definition means standing on solid ground rather than shifting sands. AI is a different technical artifact than software, making it difficult to come up with a clean definition. But it derives from (open source) software innovation. Entities- and not just Meta- were using open source to describe their work, even in the absence of a definition. Open source licenses are also mental shortcuts to understand something important about a software project. But, any invocation of open source in AI would confuse rather than clarify.
The flaws of the definition aside, I am relieved that we can now (for the most part) objectively evaluate claims. Even if the OSAID definition 'fails' long term, I think the process has been a success. Here are two possible 'failure' outcomes, which to me are still good outcomes:
Over time we realize that the OSAID definition doesn't imply the same goodness as the Open Source Definition (OSD). Some other process results in another definition and over the years it gathers the same social support as the OSD. It may or may not not be called the open source AI definition but, to quote Gallileo, the essence of things comes first. Names come after.
The rhetoric of openness in AI will lose weight. People find other/better ways of describing goodness and responsibility in AI. We all just give up on saying anything about open source in AI (and call out the ones who do). I don't think we could get to this point without having tried our hands at a definition.
For someone who wasn't in FOSS in early 2000s, it is hard to know if this process was more or less heated than the OSD consultation. Reading all the blogs and forums however, has reminded me about all that I love about the FOSS community. Working on online harms means that I am used to staring at the worst of human discourse. I don't take people disagreeing passionately yet respectfully, for granted. At present, AI appears to operate under a strong centripetal force of a few large corporations. But I trust the FOSS community to passionately push for a bigger space for public interest and get us to a better definition, if this doesn't serve the purpose. For now, we're ready to work with this one.