Thoughts on Open Source Software Development

28 10 2013

The last year I did a lot of work with some small Open Source projects (Nova, Honeyd, Neighbor-Cache Fingerprinter, Node.js Github Issue Bot…). I’ve also used Linux for all of my development and have used a lot of Open Source projects in that time. In some ways I’ve come out being more of on Open Source advocate than ever, and in other ways I’ve come out a bit jaded. What does Open Sourcing a project get you?

Good thing: free feedback on features and project direction

Unless you’re Steve Jobs, you probably don’t know what customers want. If you’re an engineer like most people reading this blog, you really probably don’t know what customers want. Open Sourcing the project can provide free user feedback. If you’re writing a business application, people will tell you they want pretty graphs generated for data that you never thought would be important. If you’re writing something with dependencies, users will tell you they want you to support multiple versions of potentially incompatible libraries that you would never have bothered with on your own.

If you’ve got an IRC channel, you’ll occasionally find a person who’s more than willing to chat about his or her opinions on the project and what features they think would be useful, in addition to the occasional issue tickets and emails.

The Open Source community can be your customers when you don’t have any real customers yet.

Good thing: free testing

Everyone who downloads and uses the project becomes someone that can help with the testing effort. All software has bugs, and if they’re annoying enough, people will report them. I’ve tried to make small contributions to bigger Open Source projects by reporting issues I’ve found in things like Node.js, Express, Backtrack, Gimp, cvv8… As a result, code becomes better tested and more stable.

Good thing: free marketing

Open Sourcing the project, at least in theory, means people will use it. They’ll talk about it to their friends, they’ll write articles and reviews about it, and if the project is actually useful it’ll start gaining popularity.

Misconception: you’ll get herds of developers willing to work on your project for free

I’ve reported dozens of bugs in big Open Source projects. I’ve modified the source code of Nmap and Apache for various data collection reasons. I’ve never submitted a patch bigger than about 3 lines of code to someone else’s Open Source project. That’s depressing to admit, but it’s the norm. People will file bug tickets, sometimes offer suggestions on features, but don’t expect a herd of developers working for free and flocking to your project to make it better. Even the most hardcore Open Source advocates have their own pet projects they would rather work on than fixing bugs or writing features into yours. Not only that, the effort to fix a bug in a foreign code base is significantly higher than the effort required for the original developer of the code to fix it. Why spend 3 hours setting up the development environment and trying to fix a bug, when you can file a ticket and the guy that wrote the code can probably fix it in 3 minutes?

There are large Open Source projects (Linux, Open Office, Apache…) that have a bunch of dedicated developers. They’re the exceptions. From what I’ve seen, most Open Source projects are run by one person or a small core group.

Misconception: the community will take over maintaining projects even if the core developer leaves

We used a Node.js library called Nowjs quite a lot. It’s a wonderful package that takes away all the tedium of manual AJAX or socket.io work and makes Javascript RPC amazingly easy. It has over 2,000 followers on Github, and probably ten times that many people using it. One day the developer decided to abandon the project to work on other things; not that unusual for a pet project. Sadly, that was the death of the project. Github makes it trivial to clone the project, with a single press of a button someone could make a copy of the repository and take over maintaining and extending it. Dozens of people initially made forks of the project in order to do that, and dozens more made forks to fix bugs they found.

What’s left? A mess consisting of dozens of Github forks of the project, all with different bugs being fixed or features added, and the “official” project left abandoned in such a way no one can figure out which fork they should use. There’s no one left to merge in patches or to make project direction decisions. New users can’t figure out which fork to use and old users that actually write patches don’t know where to submit them anymore.

The developer of Nowjs moved on to develop Bridge-js. Then Bridge-js got abandoned too.

Bridge is still open source but the engineers behind Bridge have returned to school.

This pattern is almost an epidemic in Node.js. Someone creates a really amazing module, publishes it to Github and NPM, and then abandons it. Dozens of people try to take over the development, but in the end all fail (partly because Github’s lack of specifying which fork of the project is “official”, and the Open Source problem that there is no “official” fork). A dozen new people create the same basic module from scratch, most of which never become popular, and most of which also become abandoned… You see the picture.

If you sense a hint of frustration, you’d be right. On multiple occasions I had to dig through dozens of half abandoned projects trying to figure out which library I wanted to use to do something as common as SQL in Node.js.

The reason it’s an epidemic with Node is because no one is really sure what they want yet, and projects haven’t become popular enough to have momentum to continue after abandonment by their original authors. Hopefully at some point projects will acquire a big enough user base and enough developers that they can sustain themselves.

Fork is a four letter word

Even the biggest projects aren’t immune to the anarchy of forks. Libreoffice and Openoffice, GNU Emacs vs XEmacs, the list goes on. For the end user of these software suits, this is mainly annoying. I’ve switched between LibreOffice and OpenOffice more than once now, because I keep finding bugs in one but not the other.

Sometimes forks break out for ridiculous reasons. The popular IM client Pidgin was forked into the Carrier project. Why?

As of version 2.4 and later, the ability to manually resize the text input box of conversations has been altered—Pidgin now automatically resizes between a number of lines set in ‘Preferences’ and 50% of the window depending on how much is typed. Some users find this an annoyance rather than a feature and find this solution unacceptable. The inability to manually resize the input area eventually led to a fork, Carrier (originally Funpidgin). – https://en.wikipedia.org/wiki/Pidgin_(software)

You can view the 300+ post argument about the issue on the original Pidgin ticket here.

The fact there’s no single “official” version of a project and the sometimes trivial reasons that forks break out cause a lot of inefficiency as bug are fixed in some forks but not others, and eventually code bases diverge so much that they also develop in one fork or another.

Misconception: people outside of software development understand Open Source

I once heard someone ask in confusion how Open Source software can possibly be secure, because can’t anyone upload backdoors into it? They thought Open Source projects were like Wikipedia, where anyone could edit the code and their changes would be somehow instantly applied without review. After all, people keep telling them, “Open Source is great, anyone can modify the code!”.

A half dozen times, ranging from family members to customers and business people, I’ve had to try and explain how Open Source security products can work even though the code is available. If people can see the code, they can figure out how to break and bypass it, right? Ugh…

And don’t even get me started on the people that will start comparing Open Source to communism.

Concluding thoughts

I believe Open Source software has plenty of advantages, but I also think there’s a lot of hype surrounding it. The vast majority of Open Source projects are hacked together as hobby projects and abandoned shortly after. A study of Sourceforge projects showed that less than 17% of projects actually become successful; the other 83% are abandoned in the early stages. Most projects only have a few core developers and the project won’t outlive their interest in it. The community may submit occasional patches, but are unlikely to do serious feature development.

Why release Open Source software then? I think the answer often becomes, “why not?”. Plenty of developers write code in their spare time. They don’t plan to make money directly from it: selling software is hard. They do it to sharpen their saws. They do it for fun, self improvement, learning, future career opportunities, and to release something into the world that just might be useful and make people’s lives better. If you’re writing code with no intention of making money off it, there’s really no reason not to release it as Open Source.

What if you do want to make money off it? Well, why not dual licence your code and have a free Open Source trial version along with an Enterprise version? You’ll get the advantages of free marketing, testing, and user feedback. There is the risk that someone will come along and extend the Open Source trial version into a program that has all of the same features, or even more features, as your Enterprise version, and this is something that needs to be considered. However, as I mentioned before, it’s hard to find people that will take over the development and maintenance of Open Source projects. I think it’s more likely that someone will steal your idea and create their own implementation than bother with trying to extend your trial version code, but I don’t have any proof or evidence of that.

Advertisements




NOVA: Network Antireconnaissance with Defensive Honeypots

7 06 2012

Knowledge is power, especially when regards to computer and information security. From the standpoint of a hacker, knowledge about the victim’s network is essential and the first step in any sort of attack is reconnaissance. Every little piece of seemingly innocent information can be gathered and combined to form a profile of the victim’s network, and each bit of information can help discover vulnerabilities that can be exploited to get in. What operating systems are being used? What services are running? What are the IP and MAC addresses of the machines on a network? How many machines are on the network? What firewalls and routers are in place? What’s the overall network architecture? What are the uptime statistics for the machines?

Since network reconnaissance is the first step in attacking, it follows that antireconnaissance should be the first line of defense against attacks. What can be done to prevent information gathering?

The first step in making the difficult to gather information is simply to not release it. This is the realm of authentication and firewalls, where data is restricted to subsets of authorized users and groups. This doesn’t stop the gathering of information that, by it’s nature, must be to some extent publicly available for things to function. Imagine the real life analogy of a license plate. The license plate number of the car you drive is a mostly harmless piece of information, but hiding it isn’t an option. It’s a unique identifier for your car who’s entire point is to be displayed to world. But how harmless is it really? Your license plate could be used for tracking your location: imagine a camera at a parking garage that keeps logs of all the cars that go in and out. What if someone makes a copy of your license plate for their car and uses it to get free parking at places you have authorized parking? What if someone copies the plate and uses it while speeding through red light cameras or committing other crimes? What if someone created a massive online database of every license plate they’ve ever seen, along with where they saw it and the car and driver’s information?

Although a piece of information may seem harmless by itself, it can be combined to get a more in depth picture of things and potentially be a source of exploitation.  Like a license plate, there any many things on a network that are required to be publicly accessible in order for the network to function. Since you can’t just block access to this information with a firewall, what’s the next step in preventing and slowing down reconnaissance? This is where NOVA comes in.

Since hiding information on a LAN isn’t an option, Datasoft’s NOVA (Network Obfuscation and Virtualized Anti-reconnaissance) instead tries to slow down and detect attackers by making them go threw huge amounts of fake information in the form of virtual honeypots (created with honeyd). Imagine an nmap scan on a typical corporate network. You might discover that there are 50 computers on the network, all running Windows XP and living on a single subnet. All of your attacks could then target Windows XP services and vulnerabilities. You might find a router and a printer on the network too, and spend a lot of time manually poking at them attempting to find a weakness. With NOVA and Honeyd running on the network, the same nmap scan could see hundreds of computers on the network with multiple operating systems, dozens of services running, and multiple routers. The attacker could spend hours or even days attempting to get into the decoy machines. Meanwhile, all of the traffic to these machines is being logged and analyzed by machine learning algorithms to determine if it appears hostile (matches hostile training data of past network scans, intrusion attempts, etc).

At the moment NOVA is still a bit rough around the edges, but it’s an open source C++ Linux project in a usable state that could really use some more users and contributors (shameless plug). There’s currently a QT GUI and a web interface (nodejs server with cvv8 to bind C++ to Javascript) that should provide rudimentary control of it. Given the lack of user input we’ve gotten, there are bound to be things that make perfect sense to us but are confusing to a new user, so if you download it feel free to jump on our IRC channel #nova on irc.oftc.net or post some issues on the github repository.