Follow BigDATAwire:
January 8, 2024

Vital Lessons Burgeoning Technologies Can Learn From the Open Source Movement

Dave Stokes

(Yury Zap/Shutterstock)

The media landscape has been overwhelmed as of late with headlines about artificial intelligence, quantum computing, the metaverse, and other emerging technologies poised to cause significant disruption to our society. Whenever a new or emerging category of technology takes off there is invariably a din of both doomsayers and enthusiasts alike — two very vocal camps coming to two diametrically opposed conclusions about the impending impact of said technology.

As is often the case, neither camp is completely correct. And while predictions about the trajectory of novel technologies are often flawed, these tech sectors would be wise to take some important life lessons from their open source forebears, to ensure they continue to offer society a net benefit, rather than net harm.

Leading Voices Sound the Alarm on Consolidation

At last year’s Open Source Summit North America, Booz Allan Hamilton’s lead scientist, Stella Biderman, urged her keynote audience to apply the lessons learned from decades of experience in open source to the uncharted waters of artificial intelligence.

“…[T]here’s a lot of issues that the AI community has been struggling with, that the open source committee has been working on for years, if not decades,” she said. “And there’s a lot of room, I think, for us [working with AI] to learn from the lessons that you’ve learned [in open source]…to build more accessible and more widely available technologies.”

(ImageFlow/Shutterstock)

One of the most looming concerns expressed by Biderman and others are the gargantuan barriers to entry standing between all but a select few organizations with emerging technologies. For AI in particular, the astronomical compute costs and a need for incredibly scarce, high-cost talent make entry into the space exceedingly difficult for startups and even most mid-sized businesses. As a result, we are quickly finding ourselves in an AI landscape ostensibly dominated by a small handful of megacorporations, whose hegemonic rule over the space will bring out only what they have trained into their systems to the exclusion of divergent ideas or other possible opportunities.

The most important question, then, is at what cost? What would the end result look like if we were to continue to see hyper-consolidation in today’s most influential technological fields? While it’s impossible to say with certainty, odds are, this path will lead to stalled development, unequal access, ballooning costs, a worrying lack of transparency, and perhaps even worse outcomes that we can’t currently predict.

Meta Takes the Road Less Traveled

Thankfully, one of those megacorps has taken a different tack entirely. In July Meta formally announced that they will be “open sourcing” its large language model (LLM), LLaMA, making it free for personal or commercial use to any user or organization with fewer than 700 million unique monthly users.

(Image courtesy Meta)

While this stipulation is presumably put in place to prevent their nearest competitors, such as SnapChat and TikTok, from capitalizing off of their work, it’s worth noting that this means LLaMA is not truly open source, which is certainly problematic in its own ways. However, all things considered, it’s a strong step in the right direction for today’s AI industry.

You would be hard pressed to overstate the significance of this move from Meta. While practically every other Big Tech player is scrambling to establish a moat around their technologies, Meta has chosen to make its IP freely available to (almost) anyone and everyone. By making its model open to enthusiasts, researchers, and entrepreneurs, Meta is working to ensure its continued relevance in the artificial intelligence space. In fact, it took less than two weeks after its release for the community to introduce a LLaMA-based chatbot and a LLaMA-based personal assistant.

It’s important to note, however, that this is far from an altruistic act from Meta. On the contrary, it’s a shrewd business strategy that will help to ensure its model and standards enjoy widespread adoption and interoperability, and will help to safeguard their relevance in the artificial intelligence race for many years to come.

Why it Matters: How Open Source Invariably Leads to a Better Tomorrow

With decades of experience in the non-traditional licensing of software, the open source community has proven itself to be invaluable in the development and distribution of novel technologies. And there is no reason to believe that the open source model would not do the same for any number of emerging technologies. The benefits of open source are manifold and in the context of potentially society-altering technologies such as AI and virtual reality, open source can quite possibly be the difference between them ultimately helping or harming society.

One of open source’s most compelling benefits is its ability to dramatically accelerate the pace of innovation in a given field. No matter how large of a budget a single corporation may have for development, it can never compete with a global community of highly talented enthusiasts working on advancing the state of the art free of charge. Moreover, because each incremental advancement is shared widely with the community, it ensures the largest number of minds are able to operate at the bleeding edge of innovation.

(Wright-Studio/Shutterstock)

You needn’t look any further than what the open source community has been able to achieve with Meta’s LLaMA model in a few short weeks to see this phenomenon in action. By bringing the open source community’s collective energy to bear on these models, we’ve seen the technology improve by leaps and bounds in very little time at all. Average training speeds have already skyrocketed as of late. At the same time, the model itself has become more lean, which in turn has made the development process much more accessible (as users can train and run these variants on a simple gaming laptop, as opposed to a multi-million dollar supercomputer).

Which brings us to the last (and perhaps most important) benefit that open source can bring to these emerging technologies–democratization. By providing unrestricted access to the underlying code, open source is working to ensure that these immensely powerful technologies are not owned and controlled exclusively by a shadowy cabal of private interests.

We’ve seen the very same thing come about with the development of the internet. Thanks to core open source technologies like Linux, MySQL, and PHP, the modern Internet remains open and accessible to innovation and input from a wide swathe of individuals and interests.

And this becomes even more critically important in the context of potentially society-altering technologies like AI. Without a large, strong, and vibrant open source community around these emerging technologies, we run the risk of these new technologies being developed and used in ways that do not align with the priorities and ethics of broader society. After all, if profit is the only motivating force behind technological development, society writ large is doomed to suffer the costs.

How To Promote Open Source’s Role in AI Development

Now that we’ve addressed the “why” behind open source, let’s touch on the all-important “how.” How can the average reader contribute to the continued relevance and success of the open source movement in relation to artificial intelligence and other emerging technologies? As always, advocacy goes a long way. Simply speaking up and promoting the open source philosophy within your communities and organizations can be a powerful way of advancing the movement.

(Andrey_Popov/Shutterstock)

However, not all technologies have the same requirements when it comes to promoting open source. In the case of AI, for example, in order for open source to be truly effective it must apply not only to knowledge sharing, but also resource sharing. This is vitally important in the world of AI, because unlike traditional software, AI models must be trained on immense volumes of carefully curated and annotated data.

The data pipeline represents one of AI’s most critical barriers to development, and in order to ensure a diverse ecosystem in the years to come, training data will need to be made available alongside open source models. Transparency around training data sets is also vitally important in order to identify and correct instances of bias in artificial intelligence. As many know, AI can exhibit problematic biases, and this is almost always primarily a byproduct of the data it’s trained on.

At the end of the day, if we want to see emerging technologies like AI serve the greater good, it is essential that they maintain a robust open source movement. If we hope to have safe, broadly accessible, interoperable ecosystems around these technologies, going open source is the only way.

About the author: Dave Stokes is a Technology Evangelist for Percona and the author of “MySQL & JSON – A Practical Programming Guide.” Dave has a passion for databases and teaching. He has worked for companies ranging alphabetically from the American Heart Association to Xerox and work ranging from anti-submarine warfare to web developer.

Related Items:

Rethinking ‘Open’ for AI

Inside Pandata, the New Open-Source Analytics Stack Backed by Anaconda

Who’s Winning in Open Source Data Tech

BigDATAwire