Thoughts on Open Source Software Development

28 10 2013

The last year I did a lot of work with some small Open Source projects (Nova, Honeyd, Neighbor-Cache Fingerprinter, Node.js Github Issue Bot…). I’ve also used Linux for all of my development and have used a lot of Open Source projects in that time. In some ways I’ve come out being more of on Open Source advocate than ever, and in other ways I’ve come out a bit jaded. What does Open Sourcing a project get you?

Good thing: free feedback on features and project direction

Unless you’re Steve Jobs, you probably don’t know what customers want. If you’re an engineer like most people reading this blog, you really probably don’t know what customers want. Open Sourcing the project can provide free user feedback. If you’re writing a business application, people will tell you they want pretty graphs generated for data that you never thought would be important. If you’re writing something with dependencies, users will tell you they want you to support multiple versions of potentially incompatible libraries that you would never have bothered with on your own.

If you’ve got an IRC channel, you’ll occasionally find a person who’s more than willing to chat about his or her opinions on the project and what features they think would be useful, in addition to the occasional issue tickets and emails.

The Open Source community can be your customers when you don’t have any real customers yet.

Good thing: free testing

Everyone who downloads and uses the project becomes someone that can help with the testing effort. All software has bugs, and if they’re annoying enough, people will report them. I’ve tried to make small contributions to bigger Open Source projects by reporting issues I’ve found in things like Node.js, Express, Backtrack, Gimp, cvv8… As a result, code becomes better tested and more stable.

Good thing: free marketing

Open Sourcing the project, at least in theory, means people will use it. They’ll talk about it to their friends, they’ll write articles and reviews about it, and if the project is actually useful it’ll start gaining popularity.

Misconception: you’ll get herds of developers willing to work on your project for free

I’ve reported dozens of bugs in big Open Source projects. I’ve modified the source code of Nmap and Apache for various data collection reasons. I’ve never submitted a patch bigger than about 3 lines of code to someone else’s Open Source project. That’s depressing to admit, but it’s the norm. People will file bug tickets, sometimes offer suggestions on features, but don’t expect a herd of developers working for free and flocking to your project to make it better. Even the most hardcore Open Source advocates have their own pet projects they would rather work on than fixing bugs or writing features into yours. Not only that, the effort to fix a bug in a foreign code base is significantly higher than the effort required for the original developer of the code to fix it. Why spend 3 hours setting up the development environment and trying to fix a bug, when you can file a ticket and the guy that wrote the code can probably fix it in 3 minutes?

There are large Open Source projects (Linux, Open Office, Apache…) that have a bunch of dedicated developers. They’re the exceptions. From what I’ve seen, most Open Source projects are run by one person or a small core group.

Misconception: the community will take over maintaining projects even if the core developer leaves

We used a Node.js library called Nowjs quite a lot. It’s a wonderful package that takes away all the tedium of manual AJAX or socket.io work and makes Javascript RPC amazingly easy. It has over 2,000 followers on Github, and probably ten times that many people using it. One day the developer decided to abandon the project to work on other things; not that unusual for a pet project. Sadly, that was the death of the project. Github makes it trivial to clone the project, with a single press of a button someone could make a copy of the repository and take over maintaining and extending it. Dozens of people initially made forks of the project in order to do that, and dozens more made forks to fix bugs they found.

What’s left? A mess consisting of dozens of Github forks of the project, all with different bugs being fixed or features added, and the “official” project left abandoned in such a way no one can figure out which fork they should use. There’s no one left to merge in patches or to make project direction decisions. New users can’t figure out which fork to use and old users that actually write patches don’t know where to submit them anymore.

The developer of Nowjs moved on to develop Bridge-js. Then Bridge-js got abandoned too.

Bridge is still open source but the engineers behind Bridge have returned to school.

This pattern is almost an epidemic in Node.js. Someone creates a really amazing module, publishes it to Github and NPM, and then abandons it. Dozens of people try to take over the development, but in the end all fail (partly because Github’s lack of specifying which fork of the project is “official”, and the Open Source problem that there is no “official” fork). A dozen new people create the same basic module from scratch, most of which never become popular, and most of which also become abandoned… You see the picture.

If you sense a hint of frustration, you’d be right. On multiple occasions I had to dig through dozens of half abandoned projects trying to figure out which library I wanted to use to do something as common as SQL in Node.js.

The reason it’s an epidemic with Node is because no one is really sure what they want yet, and projects haven’t become popular enough to have momentum to continue after abandonment by their original authors. Hopefully at some point projects will acquire a big enough user base and enough developers that they can sustain themselves.

Fork is a four letter word

Even the biggest projects aren’t immune to the anarchy of forks. Libreoffice and Openoffice, GNU Emacs vs XEmacs, the list goes on. For the end user of these software suits, this is mainly annoying. I’ve switched between LibreOffice and OpenOffice more than once now, because I keep finding bugs in one but not the other.

Sometimes forks break out for ridiculous reasons. The popular IM client Pidgin was forked into the Carrier project. Why?

As of version 2.4 and later, the ability to manually resize the text input box of conversations has been altered—Pidgin now automatically resizes between a number of lines set in ‘Preferences’ and 50% of the window depending on how much is typed. Some users find this an annoyance rather than a feature and find this solution unacceptable. The inability to manually resize the input area eventually led to a fork, Carrier (originally Funpidgin). – https://en.wikipedia.org/wiki/Pidgin_(software)

You can view the 300+ post argument about the issue on the original Pidgin ticket here.

The fact there’s no single “official” version of a project and the sometimes trivial reasons that forks break out cause a lot of inefficiency as bug are fixed in some forks but not others, and eventually code bases diverge so much that they also develop in one fork or another.

Misconception: people outside of software development understand Open Source

I once heard someone ask in confusion how Open Source software can possibly be secure, because can’t anyone upload backdoors into it? They thought Open Source projects were like Wikipedia, where anyone could edit the code and their changes would be somehow instantly applied without review. After all, people keep telling them, “Open Source is great, anyone can modify the code!”.

A half dozen times, ranging from family members to customers and business people, I’ve had to try and explain how Open Source security products can work even though the code is available. If people can see the code, they can figure out how to break and bypass it, right? Ugh…

And don’t even get me started on the people that will start comparing Open Source to communism.

Concluding thoughts

I believe Open Source software has plenty of advantages, but I also think there’s a lot of hype surrounding it. The vast majority of Open Source projects are hacked together as hobby projects and abandoned shortly after. A study of Sourceforge projects showed that less than 17% of projects actually become successful; the other 83% are abandoned in the early stages. Most projects only have a few core developers and the project won’t outlive their interest in it. The community may submit occasional patches, but are unlikely to do serious feature development.

Why release Open Source software then? I think the answer often becomes, “why not?”. Plenty of developers write code in their spare time. They don’t plan to make money directly from it: selling software is hard. They do it to sharpen their saws. They do it for fun, self improvement, learning, future career opportunities, and to release something into the world that just might be useful and make people’s lives better. If you’re writing code with no intention of making money off it, there’s really no reason not to release it as Open Source.

What if you do want to make money off it? Well, why not dual licence your code and have a free Open Source trial version along with an Enterprise version? You’ll get the advantages of free marketing, testing, and user feedback. There is the risk that someone will come along and extend the Open Source trial version into a program that has all of the same features, or even more features, as your Enterprise version, and this is something that needs to be considered. However, as I mentioned before, it’s hard to find people that will take over the development and maintenance of Open Source projects. I think it’s more likely that someone will steal your idea and create their own implementation than bother with trying to extend your trial version code, but I don’t have any proof or evidence of that.

Advertisements




How to excel in Computer Science or Engineering

13 01 2012

It’s official: I’m the proud new owner of a Bachelor’s of Engineering in Computer Systems Engineering, and managed to graduate with highest honors and a GPA that makes most people hate me. After nearly 5 years of intense procrastination (er, I mean intense studying), living on a diet of Monster energy drinks and ramen, and having a sleep schedule so erratic my friends call me a voluntary insomniac,  my time in college is over for the time being. Having theoretically gained at least some wisdom and experience over the last five years, I thought I would offer it to the Internet for future generations.

Learn to program before college

I learned HTML/CSS when I was 12, was using Linux by 15, and knew enough about security to hack into my school’s webserver by 16. I took up programming at maybe 17 by teaching myself Python from online tutorials (which I’ve entirely forgotten the syntax of, but many things are universal in programming languages). Am I an unusual ubergeek? Well, yes and no. The fact is that if you enter CSE101 not knowing how to program, you’ll probably be in the minority. Does this mean that you’re behind and won’t be able to do well in CS? Certainly not. But this does mean you don’t know,

  • If you’re going to like programming
  • If you’re going to do well at programming
  • If you’ll enjoy sitting at computers for hours learning stuff or go stark raving mad
  • If you’ll be frustrated  not knowing what you’re doing or just accept it and keep searching for the answers
  • If you’ve accepted the fact that you’re going to be a geek in every sense of the word (or at the very least, be surrounded by them)
  • If you can handle spending 30 minutes writing a program and 3 hours debugging it
How do you choose to devote 4 years to life into something if you haven’t tried it? The answer is that it’s a lot easier to do if you get your feet in the water early. Another advantage of learning to program before you start is that your first CS classes will be a breeze. You’ll probably be disappointed that they’re so slow and boring, but don’t be. Spend your first couple of semesters,
  • Surviving Calculus (I-III), Linear Algebra, and any other hard classes your college requires
  • Using CS classes as GPA boosters (a pile of A+’s early on is how I graduated with highest honors even though my grades slipped a bit in the end)
  • Getting rid of pesky required humanities on things like ancient mesoamerican rubber ball making cultures (true story)

Don’t be too hard on yourself as a new programmer

When I started programming, I thought I was a horrible programmer, and it scared me. Now, I do have one of those personalities that make job interview terrible because I’m always self deprecating when it comes to what I’ve done, but the truth is you’re going to start out being a terrible programmer. Take a look at something I wrote on this blog nearly 4 years ago.

Do I have the art of programming? Nothing I’ve written is very impressive, half finished and unmaintained projects clutter my hard drives. The most complicated thing I wrote was a nearly 3,000 line IRC bot, with a plethora of useless features. The architecture got so bad that I couldn’t even figure out how to fix it to make it connect to esper’s servers correctly after they changed to a new version.The !google feature also mysteriously broke. I admit, when I started writing it I knew a lot less than I did now, and I wouldn’t make a lot of the same mistakes (variables all over the global namespace, lack of comments, horribly inefficient algorithms that make it take up over 100MB of RAM when the log files are loaded) if I rewrote it from scratch, which is the only way to really salvage the project, and too much work to bother with. Meh.

On the other hand, I can’t conceive of any other career that I’d like to peruse with even half as much enthusiasm, so even if my fate is that of a mediocre code monkey, it seems better than the other possibilities.

Did I turn into a mediocre code monkey? No, I’d like to think I turned into a decent software developer. I’m still a newb in many fields and feel behind sometimes (mainly in all the new web development technology), but in general programming areas I feel rather confident. It just took a lot of practice, and frankly a lot of mistakes. That little IRC bot I wrote taught me a lot about big projects. I don’t use tons of global variables anymore, I try not to write programs that take 100MB of RAM. I once learned the hard way by taking down a big company’s production database for a night that making sure your database connection code handles disconnects properly is really important.You’re going to start out a terrible programmer; see above point about getting a head start before college.

Part of the problem that made me question my computer science talent was the stress on the science part of some of my classes. Part of me back then thought that algorithm development was really important, and without it you aren’t a good programmer, since half of what you did in classes was go out and implement merge sort or Red Black Trees while sitting around studying linear algebra and calculus. The truth is you can still be a good software developer even if tracing through dijkstra’s algorithm gives you a headache. In the real world you’re far more likely to use a library implementation of an algorithm than code it from scratch, and high level abstract understanding is far more important than detailed understanding of the implementation or the ability to come up with such an algorithm on your own.

Learning the IT stuff

Now that you’re hopefully convinced you should be getting a head start on CS stuff, how do you do it? Having a solid foundation of computer skills is essential, and something they won’t teach you in college. Go get yourself an old computer from a swapmeet or online and install Linux. Never seen hard drive jumpers or power supply connectors before? Crack that baby open and see what’s inside. Then, install Linux. The reason is not because Linux is better than Windows, it’s because Linux is more transparent and actually more difficult to use. It’ll fail to install correctly, and in the process you might figure out what a bootloader is. You’ll pick up some terms about make tools and shared libraries when you install solitaire. You’ll pick up new concepts like dependency hell and kernel panic. It will be a love hate relationship and the process may lead to shark attacks.

You could call Linux a sort of bootcamp for computer people. When I started, everything that could go wrong went wrong with it. I once by accident destroyed my windows partition. I found that my video drivers were buggy as hell, my wifi cards needed custom firmware loaded into them, and that trying to get sound to work in Linux is less fun than herding mutant cat mules. I once managed to destroy X because it was a package dependency for a Firefox upgrade and the AMD64 bit version was unstable (or incompatible with something, I never found out for sure since that was the day I gave up using Gentoo). I once spent 3 hours trying to track down and manually compile all of the dependencies for a side scrolling open source game that only provided 30 minutes of entertainment actually playing. My SSH log in was always being harassed by strange Chinese IP addresses and the server was screwed up once from SQL injections on something I wrote.

The point is: when things are working fine in Windows all day, you don’t learn anything. Linux is actually becoming easier to use than ever with distributions like Ubuntu: don’t be afraid to jump into the more “advanced” distributions. I suggest Slackware as a good middle ground between working out of the box and being a good learning experience, then pick up something like Gentoo. Remember: pain is weakness leaving the body. Sometimes the old people complaining you kids have it too easy are actually right. When I was a kid, I remember having to wander the filesystem in a MSDOS command prompt… now get off my lawn (I’m a college graduate, I have the right to say that now, right?)!

Back to the details. What do you do with Linux now that it’s working?

  • Learn to use the command line (+1 for learning shell scripting)
  • Learn Vim (+1 if you become a vim junky and install vim shortcut emulation plugins in your browser)
  • Get SSH working
  • Get a webserver working (http, ftp)
  • Did you pick up HTML yet? Go host your own webpage
  • Get Samba working
  • Play with cron jobs for doing backups and maintenance
  • Try different GUIs (KDE, Gnome, xfce, Fluxbox)
  • Try to recompile your kernel, explore start up scripts, and make it boot as fast as possible
  • Look up guides on how to keep it secure and maybe play with some offensive security (hacking) network tools on your own network

Learning the programming stuff

Alright, you’ve either gotten the basics of computer stuff down or skipped it since you want to get straight to programming that video game idea you’ve been dreaming about. Either way I would recommend that your first programming language is something high level with a decent GUI library built in. Personally I went with Python/Tk followed by TCL/Tk. Other options could be Ruby or PHP if you can find decent tutorials on the internet for them. The reason for not jumping straight into C/C++/Java is that a scripting language will probably let you get to fun stuff sooner. Using Tk you can throw together a little tick tack toe game in about 5 minutes once you know what you’re doing. The second advantage is that this will reduce you’re boredom in beginning CS classes where they will probably start with either Java or C++. The third advantage is that a scripting language is always useful as a go to language for a quick script to do something like text processing. Learn regular expressions, they’re useful even for everyday find/replace tasks on a decent text editor.

How to learn to program

Learning to program is a bit like learning a musical instrument. People can tell you how to do it 500 times and it won’t make you any better. You simply become a better programmer by programming. When you’re learning to use Linux, it generally screws up and you figure out how to fix it. When you move on to programming, you screw up and slowly learn from your mistakes until you see what to do and what not to. Some things might accelerate the learning path: seeing really good code, seeing really bad code, finding the built in library that does the exact thing you’ve spent 3 days programming from scratch, sitting through lectures on software design and algorithms. Overall the best way to learn to program IS to program. I absolutely can not stand reading tutorials on the syntax of languages. Learn the basics of a language and then move on to actually trying to program something. What you ask? What do people draw when they learn to paint? What do people play when they learn music? What kind of building does an architect practice design on? There’s no correct answer; be creative. If you can’t think of anything useful to make, you can always resort to classic games (tic tack toe, checkers, chess, blackack, rubik’s cube, tower defense game, etc etc). The point being it’s easier to study pages of dull documentation when you’re using that to work toward a working program rather than just trying to memorize stuff. Your program might have been programmed a hundred times before, but that won’t make it any less of an experience writing, and you could learn from other people’s implementations.

Keeping the passion and motivation

Everyone has highs and lows when it comes to interests in things. If you’ve been reading from the beginning of this you probably think that I sit around every night writing MMORPGs and hacking together Linux scripts that make my Linux box take input by voice command. The truth is that  my work ethic can best be summarized as spurts of obsessive genius followed by long stretches of laziness, and the vast majority of my nights are filled catching up on all the TV I somehow missed growing up (seen Buffy the Vampire Slayer? If not, go, watch, now). This occurs on both a daily and long term basis, and I don’t believe I’m alone in it. Something I saw in a post by Joel Spolsky resonated with me,

Sometimes I just can’t get anything done.

Sure, I come into the office, putter around, check my email every ten seconds, read the web, even do a few brainless tasks like paying the American Express bill. But getting back into the flow of writing code just doesn’t happen. These bouts of unproductiveness usually last for a day or two. But there have been times in my career as a developer when I went for weeks at a time without being able to get anything done. As they say, I’m not in flow. I’m not in the zone. I’m not anywhere.

For me, just getting started is the only hard thing. An object at rest tends to remain at rest. There’s something incredible heavy in my brain that is extremely hard to get up to speed, but once it’s rolling at full speed, it takes no effort to keep it going.

This is the same with me. Once you get in the middle of a programming project it isn’t a chore, it’s a rush, a feeling of zen that most people rarely find during their day job. However, finding the motivation to get started, especially without school or work deadlines pushing you can be the last thing you feel like doing. I don’t have any solution to this other than: think of something to program and force yourself to start. Sure, it might end up being a false start, I once tried to force myself into the mood by programming an Android time tracking application for people with type A personalities (or just plain OCD, doubt I would ever actually be able to consistently use such and application). It ended in nothing but a failed attempt to find the Ballmer Peak and a slightly more advanced version of the classic Hello World program to see if my SDK was set up.

You might find motivation in odd places too. Why did I take up exploring network security? Anger at someone who told me their page was secure after I said it was outdated and full of holes. I once programmed a puzzle solving game automation program to impress a girl (she was impressed, but in hindsight asking her out would have been a far more effective move). The IRC bot I wrote was partly inspired by another guy writing a bot and battling mine with massive kick/ban/flood wars until the IRC network banned both our IP addresses.  Good times. What you should take away from this is that programming game playing algorithms is not a good way to attract a girl.

If you manage to make it through college as a CS major without having at least one existential crisis, I’ll be amazed. At some point you’ll probably be sitting gloomily in front of your computer realizing that the last x years of your life have been focused around the perusal of degree which will enable you to spend 10x that number of years sitting in front of computers pressing buttons and making the patterns on the screen change. In fact, this was the hardest thing about college for me. It wasn’t the work, I could do the work easily if I applied myself. It was just trying to not fall into apathy, or worse, pure hate for school (keep reading and I’ll got to this).

How did I survive it without becoming a deranged alcoholic? It’s surprising how many software developers and IT people you can find in hole in the wall bars; you should be concerned I know this. There is no easy answer to this. My only advice is to find something to do, even if it isn’t programming. The worse period of my college experience was about 6-12 months ago when I simply had no motivation to touch any CS stuff outside of work and school. The classes I had were both boring and tedium filled, the capstone project I was looking forward to ended up being a huge wast of time, and I’d been both in school and at my internship long enough that absolutely nothing new or exciting was happening in my life. I managed to narrowly avoid failing Statistics for Engineers (pulled a C out of it from 2 days of cramming for the final). The only thing that pulled me out of it was the reality that this is my last semester and I really need to start doing things like planning for a career and not failing my last upper division humanity.

Thoughts on bad professors

An unfortunate fact of my college experience is that I would say the bad professors outnumbered the good professors by probably 2 to 1. For every lecture I went to, I had 2 others with a professor I found mediocre at best.  Is this conclusion because my expectations for professors are too high? Maybe: what makes a good professor?

  • Knowledge of the subject matter above and beyond the textbooks
  • Can be understood (doesn’t skip too many steps, can speak English, can explain things articulately, can write legibly)
  • Entertaining personality/ability to hold people’s attention (throwing in a joke or story instead of 75 minutes of straight monotone slide reading)
  • Fair grading scheme and tests
Once you get to upper division classes, you might be able to start picking classes from professors you know are good. I’d definitely recommend this, nothing is worse than being interested and excited by a topic and then losing all interest because you hate the class so much.

Graduating on time

College advisers always push graduating on time. I went the route of attempting a double major (in math), giving that up, and ended up being behind 1/2 to 1 year depending on how you count things. I don’t regret it at all, if you’re liking college, don’t be afraid to kill some time with interesting non-required classes. It’s easy to get internships when you’re in college that look good on a resume, and in the big scheme of things, who cares if you’re a year older when you enter the work force?

Exception: if you’re hating every moment of college, try and get out quick and not procrastinate. I put off a few classes I didn’t want to take and I would have done better on them earlier on before I lost a lot of my motivation. For those in pain, the quick bandage approach is far better than the slow procrastinating but loathing perma-college student that takes 8 years to finish their degree (I’ve seen it). But hey, 8 years if you’re having the time of your life might be okay, just depends on your situation.

Final thoughts and a link

Somewhere in the middle of college I stumbled across Joel Spolsky’s blog, Joel on Software; I highly recommend reading through his old posts. One of his posts offers some advice for computer science majors.  Go read it yourself, but here are his main points.

  • Learn how to write before graduating.
  • Learn C before graduating.
  • Learn microeconomics before graduating.
  • Don’t blow off non-CS classes just because they’re boring.
  • Take programming-intensive courses.
  • Stop worrying about all the jobs going to India.
  • No matter what you do, get a good summer internship.
A good point to end on: companies like having interns. Programming interviews are hard to properly conduct. Internships let companies have a nice trial period where they get to see what you’re capable of, and you get experience, resume fodder, and money. One problem I had was not knowing when I had enough skills to actually get an internship. Simple answer if you’re an ASU student, CSE310 (Data Structures and Algorithms) and CSE360 (Sofware Engineering/Design) will give you what you need to at least have a good change at surviving a technical interview. That doesn’t mean you couldn’t pick of the stuff you need before the classes or try to interview anyway, but in my experience interviewers focus on algorithmn (reverse a linked list, common algorithm time complexity, space complexity, hash tables, sorting) and OOP design (think through a first pass architecture of a program that does blah and scribble it on a white board) questions a lot, especially for college students that don’t really have much past experience to point to in order to show their skills.