Neighbor Cache Fingerprinter: Operating System Version Detection with ARP

30 12 2012

I’ve released the first prototype (written in C++) of an Open Source tool called the Neighbor Cache Fingerprinter on Github today. A few months ago, I was watching the output of a lightweight honeypot in a Wireshark capture and noticed that although it had the capability to fool nmap’s operating system scanner into thinking it was a certain operating system, there were subtle differences in the ARP behavior that could be detected. This gave me the idea to explore the possibility of doing OS version detection with nothing except ARP. The holidays provided a perfect time to destroy my sleep schedule and get some work done on this little side project (see commit punchcard, note best work done Sunday at 2:00am).

ncfpunchcard

The tool is currently capable of uniquely identifying the following operating systems,

Windows 7
Windows XP (fingerprint from Service Pack 3)
Linux 3.x (fingerprint from Ubuntu 12.10)
Linux 2.6 (fingerprint from Century Link Q1000 DSL Router)
Linux 2.6 (newer than 2.6.24) (fingerprint from Ubuntu 8.04)
Linux 2.6 (older than 2.6.24) (fingerprint from Knoppix 5)
Linux 2.4 (fingerprint from Damn Small Linux 4.4.10)
FreeBSD 9.0-RELEASE
Android 4.0.4
Android 3.2
Minix 3.2
ReactOS 0.3.13

More operating systems should follow as I get around to spinning up more installs on Virtual Machines and adding to the fingerprints file. Although it’s still a fairly early prototype, I believe it’s already a useful enough tool that it can be beneficial, so install it and let me know via the Github issues page if you find any bugs. There’s very little existing research on this; arp-fingerprint (a perl script that uses arp-scan) is the only thing remotely close, and it attempts to identify the OS only by looking at responses to ARP REQUEST packets. The Neighbor Cache Fingerprinter focuses on sending different types of ARP REPLY packets as well as analyzing several other behavioral quirks of ARP discussed in the notes below.

The main advantage of the Neighbor Cache Fingerprinter versus an Nmap OS scan is that the tool can do OS version detection on a machine that has all closed ports. The next big feature I’m working on is expanding the probe types to allow it to work on machines that respond to ICMP pings, OR have open TCP ports, OR have closed TCP ports, OR have closed UDP ports. The tool just needs the ability to elicit a reply from the target being scanned, and a pong, TCP/RST, TCP/ACK, or ICMP unreachable message will all provide that.

The following are my notes taken from the README file,

Introduction

What is the Neighbor Cache? The Neighbor Cache is an operating system’s mapping of network addresses to link layer addresses maintained and updated via the protocol ARP (Address Resolution Protocol) in IPv4 or NDP (Neighbor Discovery Protocol) in IPv6. The neighbor cache can be as simple as a lookup table updated every time an ARP or NDP reply is seen, to something as complex as a cache that has multiple timeout values for each entry, which are updated based on positive feedback from higher level protocols and usage characteristics of that entry by the operating system’s applications, along with restrictions on malformed or unsolicited update packets.

This tool provides a mechanism for remote operating system detection by extrapolating characteristics of the target system’s underlying Neighbor Cache and general ARP behavior. Given the non-existence of any standard specification for how the Neighbor Cache should behave, there several differences in operating system network stack implementations that can be used for unique identification.

Traditional operating system fingerprinting tools such as Nmap and Xprobe2 rely on creating fingerprints from higher level protocols such as TCP, UDP, and ICMP. The downside of these tools is that they usually require open TCP ports and responses to ICMP probes. This tool works by sending a TCP SYN packet to a port which can be either open or closed. The target machine will either respond with a SYN/ACK packet or a SYN/RST packet, but either way it must first discover the MAC address to send the reply to via queries to the ARP Neighbor Cache. This allows for fingerprinting on target machines that have nothing but closed TCP ports and give no ICMP responses.

The main disadvantage of this tool versus traditional fingerprinting is that because it’s based on a Layer 2 protocol instead of a Layer 3 protocol, the target machine that is being tested must reside on the same Ethernet broadcast domain (usually the same physical network). It also has the disadvantage of being fairly slow compared to other OS scanners (a scan can take ~5 minutes).

Fingerprint Technique: Number of ARP Requests

When an operating system performs an ARP query it will often resend the request multiple times in case the request or the reply was lost. A simple count of the number of requests that are sent can provide a fingerprint feature. In addition, there can be differences in the number of responses to open and closed ports due to multiple retries on the higher level protocols, and attempting to send a probe multiple times can result in different numbers of ARP requests (Android will initially send 2 ARP requests, but the second time it will only send 1).

For example,

Windows XP: Sends 1 request

Windows 7: Sends 3 if probe to closed port (9 if probe to open port)

Linux: Sends 3 requests

Android 3: Sends 2 requests the first probe, then 1 request after
A minimum and maximum number of requests seen is recorded in the fingerprint.

Fingerprint Technique: Timing of ARP Request Retries

On hosts that retry ARP requests, the timing values can be used to deduce more information. Linux hosts generally have a constant retry time of 1 second, while Windows hosts generally back off on the timing, sending their first retry after between 500ms and 1s, and their second retry after 1 second.

The fingerprint contains the minimum time difference between requests seen, maximum time difference, and a boolean value indicating if the time differences are constant or changing.

Fingerprint Technique: Time before cache entry expires

After a proper request/reply ARP exchange, the Neighbor Cache gets an entry put in it for the IP address and for a certain amount of time communication will continue without additional ARP requests. At some point, the operating system will decide the entry in the cache is stale and make an attempt to update it by sending a new ARP request.

To test this a SYN packet is sent, an ARP exchange happens, and then SYN packets are sent once per second until another ARP request is seen.

Operating system response examples,

Windows XP : Timeout after 10 minutes (if referred to)

Windows 7/Vista/Server 2008 : Timeout between 15 seconds and 45 seconds

Freebsd : Timeout after 20 minutes

Linux : Timeout usually around 30 seconds
More research needs to be done on the best way to capture the values of delay_first_probe_time and differences between stale timing and actually falling out of the table and being gc’ed in Linux.

Waiting 20 minutes to finish the OS scan is unfeasible in most cases, so the fingerprinting mode only waits about 60 seconds. This may be changed later to make it easier to detect an oddity in older windows targets where cache entries expire faster if they aren’t used (TODO).

Fingerprint Technique: Response to Gratuitous ARP Replies

A gratuitous or unsolicited ARP reply is an ARP reply for which there was no request. The usual use case for this is notification of machines on the network of IP changes or systems coming online. The problem for implementers is that several of the fields in the ARP packet no longer make much sense.

Who is the Target Protocol Address for the ARP packet? The broadcast address? Zero? The specification surprisingly says neither: the target Protocol address should be the same IP address as the Sender Protocol Address.

When there’s no specific target for the ARP packet, the Target Hardware Address also becomes a confusing field. The specification says it’s value shouldn’t matter, but should be set to zero. However, most implementations will use the Ethernet broadcast address of FF:FF:FF:FF:FF instead, because internally they have some function to send an ARP reply that only takes one argument for the destination MAC address (and is put in both the Ethernet frame destination and the ARP packet’s Target Hardware Address). We can also experiment with setting the Target Hardware Address to the same thing as the Sender Hardware Address (the same method the spec says to use for the Target Protocol field).

Even the ARP opcode becomes confusing in the case of unsolicited ARP packets. Is it a “request” for other machines to update their cache? Or is it a “reply”, even though it isn’t a reply to anything? Most operating systems will update their cache no matter the opcode.

There are several variations of the gratuitous ARP packet that can be generated by changing the following fields,

Ethernet Frame Destination Address : Bcast or the MAC of our target

ARP Target Hardware Address : 0, bcast, or the MAC of our target

ARP Target Protocol Address : 0 or the IP address of our target

ARP Opcode : REPLY or REQUEST
This results in 36 different gratuitous packet permutations.

Most operating systems have the interesting behavior that they will ignore gratuitous ARP packets if the sender is not in the Neighbor Cache already, but if the sender is in the Neighbor Cache, they will update the MAC address, and in some operating systems also update the timeouts.
The following sequence shows the testing technique for this feature,

Send ARP packet that is known to update most caches with srcmac = srcMacArg Send gratuitous ARP packet that is currently being tested with srcmac = srcMacArg + 1 Send probe packet with a source MAC address of srcMacArg in the Ethernet frame

The first packet attempts to get the cache entry into a known state: up to date and storing the source MAC address that is our default or the command line argument –srcmac. The following ARP packet is the actual probe permutation that’s being tested.

If the reply to the probe packet is to (srcMacArg + 1), then we know the gratuitous packet successfully updated the cache entry. If the reply to the probe is just (srcMacArg), then we know the cache was not updated and still contains the old value.

The reason the Ethernet frame source MAC address in the probe is set to the original srcMacArg is to ensure the target isn’t just replying to the MAC address it sees packets from and is really pulling the entry out of ARP.

Sometimes the Neighbor Cache entry will get into a state that makes it ignore gratuitous packets even though, given a normal state, it would accept them and update the entry. This can result in some timing related result changes. For now I haven’t made an attempt to fix this as it’s actually useful as a fingerprinting method in itself.

Fingerprint Technique: Can we get put into the cache with a gratuitous packet?

As mentioned in the last section, most operating systems won’t add a new entry to the cache given a gratuitous ARP packet, but they will update existing entries. One of the few differences between Windows XP and FreeBSD’s fingerprint is that we can place an entry in the cache by sending a certain gratuitous packet to a FreeBSD machine, and test if it was in the cache by seeing if a probe gets a response or not.

Fingerprint Technique: ARP Flood Prevention (Ignored rapid ARP replies)

RFC1122 (Requirements for Internet Hosts) states,

“A mechanism to prevent ARP flooding (repeatedly sending an ARP Request for the same IP address, at a high rate) MUST be included. The recommended maximum rate is 1 per second per destination.”

Linux will not only ignore duplicate REQUEST packets within a certain time, but also duplicate REPLY packets. We can test this by sending a set of unsolicited ARP replies within a short time range with difference MAC addresses being reported by each reply. Sending a probe will reveal in the probe response destination MAC address if the host responds to the first MAC address we ARPed or the last, indicating if it ignored the later rapid replies.

Fingerprint Technique: Correct Reply to RFC5227 ARP Probe

This test sends an “ARP Probe” as defined by RFC 5227 (IPv4 Address Conflict Detection) and checks the response to see if it confirms to the specification. The point of the ARP Probe is to check if an IP address is being used without the risk of accidentally causing someone’s ARP cache to update with your own MAC address when it sees your query. Given that you’re likely trying to tell if an IP address is being used because you want to claim it, you likely don’t have an IP address of your own yet, so the Sender Protocol Address field is set to 0 in the ARP REQUEST.

The RFC specifies the response as,

“(the probed host) MAY elect to attempt to defend its address by … broadcasting one single ARP Announcement, giving its own IP and hardware addresses as the sender addresses of the ARP, with the ‘target IP address’ set to its own IP address, and the ‘target hardware address’ set to all zeroes.”

But any Linux kernel older than 2.6.24 and some other operating systems will respond incorrectly, with a packet that has tpa == spa and tha == sha. Checking if tpa == 0 has proven sufficient for a boolean fingerprint feature.

TODO RESEARCH IN PROGRESS Fingerprint Technique

Feedback from higher protocols extending timeout values

Linux has the ability to extend timeout values if there’s positive feedback from higher level protocols, such as a 3 way TCP handshake. Need to write tests for this and do some source diving in the kernel to see what else counts besides a 3 way handshake for positive feedback.

TODO RESEARCH IN PROGRESS Fingerprint Technique

Infer Neighbor Cache size by flooding to cause entry dumping

Can we fill the ARP table with garbage entries in order for it to start dumping old ones? Can we reliably use this to infer the table size, even with Linux’s near random cache garbage collection rules? Can we do this on class A networks, or do we really need class B network subnets in order to make this a viable test?





All about network configuration in Ubuntu Server 12.04/12.10

1 11 2012

Network configuration in Linux can be confusing; this post traces hrough the layers from top to bottom in order to take away some of the confusion and provide some detailed insight into the network initialization and configuration process in Ubuntu Server 12.04 and 12.10.

Initial System Startup

In Ubuntu, upstart is gradually replacing traditional init scripts that start and stop based primarily on run levels. The upstart script for basic networking is located in /etc/init/networking.conf,

# networking – configure virtual network devices
#
# This task causes virtual network devices that do not have an associated
# kernel object to be started on boot.

description “configure virtual network devices”

emits static-network-up
emits net-device-up

start on (local-filesystems
and (stopped udevtrigger or container))

task

pre-start exec mkdir -p /run/network

exec ifup -a

The ‘local-filesystems’ event is triggered when all file systems have been finished being mounted, and the ‘stopped udevtrigger or container’ line is to ensure that the /run folder is ready to be used (which contains process IDs, locks, and other information programs want to temporarily store while they’re running). The task keyword tells upstart that this is a task that should end in a finite amount of time (rather than a service, which has daemon like behavior). The important thing to take away from this is that the command “ifup -a” is called when the system starts up.

Configuring ifup

The configuration file for ifup is located in /etc/network/interfaces and this is the file you’ll want to modify for a basic network configuration. The ifup tool allows configuring of your network interfaces and will attempt to serially go through and bring them up one at a time when ifconfig -a is called if the ‘auto’ keyword is specified. An example configuration follows,

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet static
address 192.168.0.42
netmask 255.255.255.0
network 192.168.0.0
broadcast 192.168.0.255
gateway 192.168.0.1
dns-nameservers 192.168.0.1 8.8.8.8

 

This configuration starts out with the loopback adapter, which should be there by default. The next entry for eth0 will attempt to use DHCP, and the entry for eth1 will use a static IP and configuration. Something to keep in mind is that this will ONLY be run when the system starts up. For a simple desktop machine that is always plugged into the network, this configuration will probably be all you need to do. But what happens if you’re unplugging and plugging things in a lot, such as on a laptop? You will run into two problems: the first is that if you have interfaces set to DHCP and they aren’t plugged in when you’re booting, you’ll likely have a  “waiting for network configuration” message followed by “waiting up to 60 more seconds for network configuration..” which can slow your boot time by several minutes. The second problem is that once the system is booted, plugging in an Ethernet cable won’t actually cause a DHCP request to be sent, since ifup -a is only called once when the system is booting. If you want to avoid these problems, you’ll have to use a tool like ifplugd.

Using ifplugd to handle interfaces that are unplugged a lot

From the man page,

ifplugd is a daemon which will automatically configure your ethernet device when a cable is plugged in and automatically unconfigure it if the cable is pulled. This is useful on laptops with on-board network adapters, since it will only configure the interface when a cable is really connected.

 

Installing and configuring ifplugd is easy. First, go into /etc/network/interfaces and change the ‘auto eth0’ settings to ‘allow-hotplug eth0’. Now ifup -a will not activate this interface, but will instead allow the ifplugd daemon to bring it up. The configuration information for the interface will still be used from /etc/network/interfaces. To install and configure ifplugd run the following,

 

sudo apt-get install ifplugd

sudo dpkg-reconfigure ifplugd

Enter the names of all the interfaces that you want ifplugd to configure when the link status changes and the reconfigure tool will update /etc/default/ifplugd. Instead of using upstart, ifplugd currently uses the old style init.d scripts, and is launched from /etc/init.d/ifplugd.

Now you should be able to unplug and plug in Ethernet cables and DHCP requests will be sent each time!

Note for Ubuntu Desktop Users

Ubuntu Desktop uses the Network Manager GNOME tool to configure the network, and most of the time you should be able to configure everything graphically using it. This is specifically for Ubuntu Server or an Ubuntu version without Network Manager.





NOVA: Network Antireconnaissance with Defensive Honeypots

7 06 2012

Knowledge is power, especially when regards to computer and information security. From the standpoint of a hacker, knowledge about the victim’s network is essential and the first step in any sort of attack is reconnaissance. Every little piece of seemingly innocent information can be gathered and combined to form a profile of the victim’s network, and each bit of information can help discover vulnerabilities that can be exploited to get in. What operating systems are being used? What services are running? What are the IP and MAC addresses of the machines on a network? How many machines are on the network? What firewalls and routers are in place? What’s the overall network architecture? What are the uptime statistics for the machines?

Since network reconnaissance is the first step in attacking, it follows that antireconnaissance should be the first line of defense against attacks. What can be done to prevent information gathering?

The first step in making the difficult to gather information is simply to not release it. This is the realm of authentication and firewalls, where data is restricted to subsets of authorized users and groups. This doesn’t stop the gathering of information that, by it’s nature, must be to some extent publicly available for things to function. Imagine the real life analogy of a license plate. The license plate number of the car you drive is a mostly harmless piece of information, but hiding it isn’t an option. It’s a unique identifier for your car who’s entire point is to be displayed to world. But how harmless is it really? Your license plate could be used for tracking your location: imagine a camera at a parking garage that keeps logs of all the cars that go in and out. What if someone makes a copy of your license plate for their car and uses it to get free parking at places you have authorized parking? What if someone copies the plate and uses it while speeding through red light cameras or committing other crimes? What if someone created a massive online database of every license plate they’ve ever seen, along with where they saw it and the car and driver’s information?

Although a piece of information may seem harmless by itself, it can be combined to get a more in depth picture of things and potentially be a source of exploitation.  Like a license plate, there any many things on a network that are required to be publicly accessible in order for the network to function. Since you can’t just block access to this information with a firewall, what’s the next step in preventing and slowing down reconnaissance? This is where NOVA comes in.

Since hiding information on a LAN isn’t an option, Datasoft’s NOVA (Network Obfuscation and Virtualized Anti-reconnaissance) instead tries to slow down and detect attackers by making them go threw huge amounts of fake information in the form of virtual honeypots (created with honeyd). Imagine an nmap scan on a typical corporate network. You might discover that there are 50 computers on the network, all running Windows XP and living on a single subnet. All of your attacks could then target Windows XP services and vulnerabilities. You might find a router and a printer on the network too, and spend a lot of time manually poking at them attempting to find a weakness. With NOVA and Honeyd running on the network, the same nmap scan could see hundreds of computers on the network with multiple operating systems, dozens of services running, and multiple routers. The attacker could spend hours or even days attempting to get into the decoy machines. Meanwhile, all of the traffic to these machines is being logged and analyzed by machine learning algorithms to determine if it appears hostile (matches hostile training data of past network scans, intrusion attempts, etc).

At the moment NOVA is still a bit rough around the edges, but it’s an open source C++ Linux project in a usable state that could really use some more users and contributors (shameless plug). There’s currently a QT GUI and a web interface (nodejs server with cvv8 to bind C++ to Javascript) that should provide rudimentary control of it. Given the lack of user input we’ve gotten, there are bound to be things that make perfect sense to us but are confusing to a new user, so if you download it feel free to jump on our IRC channel #nova on irc.oftc.net or post some issues on the github repository.





Xmonad Configuration for DVORAK

30 03 2012

I’ve been using Xmonad for a couple of months now, and I really quite like it for software development. I would say it’s most useful with large dual monitors, but I’ve even tried it on a netbook (with limited usability success). Before using Xmonad, I would constantly loose track of windows. In Linux it was terminal windows. Being a command line guru, I would pull up a terminal to do everything from edit a text file in vim to just using a terminal as a launcher to quickly type ‘firefox &’ or some other application. Having to alt+tab through the inevitable piles of terminals I would have up was annoyingly painful, and instead of finding the one I want I would likely just launch a new one and add to the mess.

Now I’ve gotten my Xmonad habits and workflow down.

Workspaces 1 and 2 are used on the first monitor

Workspaces 3 and 4+ are used on the second monitor

 

Workspace 1: Browser and a terminal on the bottom for quick trivial commands

Workspace 2: Eclipse or other IDE

Workspace 3: 2-3 terminals and IRC window (most terminal related work done here)

Workspace 4: Usually a full screen application I’m testing

Workspace 4+: Misc usages as needed

 

And, of course, I’ve got all of the keyboard shortcuts optimized for the DVORAK homerow. Here’s my xmonad.hs configuration file if anyone wants to try my shortcut scheme.

 

import XMonad
import XMonad.Config.Gnome
import XMonad.Hooks.ManageHelpers
import XMonad.Layout.Gaps
import XMonad.Actions.FloatKeys
import XMonad.Actions.CycleWS
import XMonad.Hooks.ManageDocks
import XMonad.Hooks.DynamicLog
import XMonad.Util.EZConfig
import XMonad.Util.Run
import XMonad.Layout.NoBorders
import XMonad.Layout.ResizableTile
import XMonad.Actions.DwmPromote
import System.Exit

import qualified System.IO.UTF8
import qualified XMonad.StackSet as W
import qualified Data.Map as M

myManageHook = composeAll (
[ manageHook gnomeConfig
, className =? “Unity-2d-panel” –> doIgnore
, className =? “Unity-2d-launcher” –> doIgnore
, className =? “Gimp” –> doFloat
, className =? “novagui” –> doFloat
, isFullscreen –> doFullFloat
])

myKeys = \c -> mkKeymap c $
[ (“M-S-<Return>”, spawn “gnome-terminal”)

— launch programs
, (“M-r f f”, spawn “firefox”)
, (“M-r M-c”, spawn “chromium-browser”)
, (“M-r M-r”, spawn “grun”)
, (“M-r h a l t”, spawn “sudo shutdown -h now”)
, (“M-r s s”, spawn “scrot”)
, (“M-r s S-s”, spawn “scrot -s”)
, (“M-r v”, spawn “gvim”)

— Rotate through the available layout algorithms
, (“M-<Space>”, sendMessage NextLayout)
, (“M-S-<Space>”, sendMessage FirstLayout)

— close focused window
, (“M-w”, kill)
— Resize viewed windows to the correct size
, (“M-S-r”, refresh)

— Sceen lock
, (“M-l”, spawn $ “gnome-screensaver-command -l”)

— Toggle float
, (“M-d”, withFocused $ windows . W.sink)

— These are all DVORAK optimized navigation keys

— Move window focus with right/left index fingers
, (“M-u”, windows W.focusDown)
, (“M-h”, windows W.focusUp )
, (“M-<Return>”, dwmpromote )
— Swap window
, (“M-S-u”, windows W.swapDown >> windows W.focusDown)
, (“M-S-h”, windows W.swapUp >> windows W.focusUp)

— Resize the master area with right/left middle fingers
, (“M-t”, sendMessage Expand)
, (“M-e”, sendMessage Shrink)
, (“M-S-e”, sendMessage MirrorShrink)
, (“M-S-t”, sendMessage MirrorExpand)

— Change windows in the master area with right/left ring fingers
, (“M-n”, sendMessage (IncMasterN 1))
, (“M-o”, sendMessage (IncMasterN (-1)))

, (“M-s”, nextScreen)
, (“M-a”, prevScreen)
, (“M-S-s”, shiftNextScreen >> nextScreen)
, (“M-S-a”, shiftPrevScreen >> prevScreen)

— Quit xmonad
, (“M-S-q”, io (exitWith ExitSuccess))

— Restart xmonad
, (“M-q”, restart “xmonad” True)
] ++
— mod-[1..9], Switch to workspace N
— mod-shift-[1..9], Move client to workspace N
[(m ++ (show k), windows $ f i)
| (i, k) <- zip (XMonad.workspaces c) [1 .. 9]
, (f, m) <- [(W.greedyView, “M-“), (W.shift, “M-S-“)]
] ++

— moving floating window with key
[(c ++ m ++ k, withFocused $ f (d x))
| (d, k) <- zip [\a->(a, 0), \a->(0, a), \a->(0-a, 0), \a->(0, 0-a)] [“<Right>”, “<Down>”, “<Left>”, “<Up>”]
, (f, m) <- zip [keysMoveWindow, \d -> keysResizeWindow d (0, 0)] [“M-“, “M-S-“]
, (c, x) <- zip [“”, “C-“] [20, 2]
]

myLayouts = gaps [(U, 24)] $ layoutHook gnomeConfig

main = xmonad gnomeConfig {
manageHook = myManageHook
, layoutHook = myLayouts
, borderWidth = 2
, terminal = “gnome-terminal”
, normalBorderColor = “#000099”
, focusedBorderColor = “#009900”
, modMask = mod4Mask
, keys = myKeys }





Samsung Fail

16 03 2012

Attempted to find the service number for a Samsung product I sent back for repair…

 





Linux Tip: Arrow keys not working for command input? RLFE to the rescue.

26 01 2012

I’m always wandering across tools in Linux that don’t support line history (up arrow) or the ability to edit lines/move around with the arrow keys. Side note: if you write a Linux tool that takes user commands, stop being lazy and just go link it to the GNU readline library so your command line interface doesn’t make people hate you. There’s nothing more annoying than trying to go back to fix a typo with your arrow keys and getting a pile of gibberish instead of a moving cursor. For example in tclsh,

% puts “stuff goes herr”^[[D^[[D^[[D^[[A^[[C^[[B^[[D <- (typo, right arrow, right arrow, RAAAGE)

The solution? rlfe: the read line front-end processor. It’s got a few bugs, but it works great for things like telnet and tclsh that by default don’t have line history and arrow key navigation.

$ sudo apt-get install rlfe
$ rlfe tclsh

Replace tclsh with practically any command line tool and get back to typing without fear of typos. Plus, you don’t have to keep retyping/copy pasting things when you want to run them again. The rlfe process will stick around after you close the application, so you really only need to run it once with rlfe.





How to excel in Computer Science or Engineering

13 01 2012

It’s official: I’m the proud new owner of a Bachelor’s of Engineering in Computer Systems Engineering, and managed to graduate with highest honors and a GPA that makes most people hate me. After nearly 5 years of intense procrastination (er, I mean intense studying), living on a diet of Monster energy drinks and ramen, and having a sleep schedule so erratic my friends call me a voluntary insomniac,  my time in college is over for the time being. Having theoretically gained at least some wisdom and experience over the last five years, I thought I would offer it to the Internet for future generations.

Learn to program before college

I learned HTML/CSS when I was 12, was using Linux by 15, and knew enough about security to hack into my school’s webserver by 16. I took up programming at maybe 17 by teaching myself Python from online tutorials (which I’ve entirely forgotten the syntax of, but many things are universal in programming languages). Am I an unusual ubergeek? Well, yes and no. The fact is that if you enter CSE101 not knowing how to program, you’ll probably be in the minority. Does this mean that you’re behind and won’t be able to do well in CS? Certainly not. But this does mean you don’t know,

  • If you’re going to like programming
  • If you’re going to do well at programming
  • If you’ll enjoy sitting at computers for hours learning stuff or go stark raving mad
  • If you’ll be frustrated  not knowing what you’re doing or just accept it and keep searching for the answers
  • If you’ve accepted the fact that you’re going to be a geek in every sense of the word (or at the very least, be surrounded by them)
  • If you can handle spending 30 minutes writing a program and 3 hours debugging it
How do you choose to devote 4 years to life into something if you haven’t tried it? The answer is that it’s a lot easier to do if you get your feet in the water early. Another advantage of learning to program before you start is that your first CS classes will be a breeze. You’ll probably be disappointed that they’re so slow and boring, but don’t be. Spend your first couple of semesters,
  • Surviving Calculus (I-III), Linear Algebra, and any other hard classes your college requires
  • Using CS classes as GPA boosters (a pile of A+’s early on is how I graduated with highest honors even though my grades slipped a bit in the end)
  • Getting rid of pesky required humanities on things like ancient mesoamerican rubber ball making cultures (true story)

Don’t be too hard on yourself as a new programmer

When I started programming, I thought I was a horrible programmer, and it scared me. Now, I do have one of those personalities that make job interview terrible because I’m always self deprecating when it comes to what I’ve done, but the truth is you’re going to start out being a terrible programmer. Take a look at something I wrote on this blog nearly 4 years ago.

Do I have the art of programming? Nothing I’ve written is very impressive, half finished and unmaintained projects clutter my hard drives. The most complicated thing I wrote was a nearly 3,000 line IRC bot, with a plethora of useless features. The architecture got so bad that I couldn’t even figure out how to fix it to make it connect to esper’s servers correctly after they changed to a new version.The !google feature also mysteriously broke. I admit, when I started writing it I knew a lot less than I did now, and I wouldn’t make a lot of the same mistakes (variables all over the global namespace, lack of comments, horribly inefficient algorithms that make it take up over 100MB of RAM when the log files are loaded) if I rewrote it from scratch, which is the only way to really salvage the project, and too much work to bother with. Meh.

On the other hand, I can’t conceive of any other career that I’d like to peruse with even half as much enthusiasm, so even if my fate is that of a mediocre code monkey, it seems better than the other possibilities.

Did I turn into a mediocre code monkey? No, I’d like to think I turned into a decent software developer. I’m still a newb in many fields and feel behind sometimes (mainly in all the new web development technology), but in general programming areas I feel rather confident. It just took a lot of practice, and frankly a lot of mistakes. That little IRC bot I wrote taught me a lot about big projects. I don’t use tons of global variables anymore, I try not to write programs that take 100MB of RAM. I once learned the hard way by taking down a big company’s production database for a night that making sure your database connection code handles disconnects properly is really important.You’re going to start out a terrible programmer; see above point about getting a head start before college.

Part of the problem that made me question my computer science talent was the stress on the science part of some of my classes. Part of me back then thought that algorithm development was really important, and without it you aren’t a good programmer, since half of what you did in classes was go out and implement merge sort or Red Black Trees while sitting around studying linear algebra and calculus. The truth is you can still be a good software developer even if tracing through dijkstra’s algorithm gives you a headache. In the real world you’re far more likely to use a library implementation of an algorithm than code it from scratch, and high level abstract understanding is far more important than detailed understanding of the implementation or the ability to come up with such an algorithm on your own.

Learning the IT stuff

Now that you’re hopefully convinced you should be getting a head start on CS stuff, how do you do it? Having a solid foundation of computer skills is essential, and something they won’t teach you in college. Go get yourself an old computer from a swapmeet or online and install Linux. Never seen hard drive jumpers or power supply connectors before? Crack that baby open and see what’s inside. Then, install Linux. The reason is not because Linux is better than Windows, it’s because Linux is more transparent and actually more difficult to use. It’ll fail to install correctly, and in the process you might figure out what a bootloader is. You’ll pick up some terms about make tools and shared libraries when you install solitaire. You’ll pick up new concepts like dependency hell and kernel panic. It will be a love hate relationship and the process may lead to shark attacks.

You could call Linux a sort of bootcamp for computer people. When I started, everything that could go wrong went wrong with it. I once by accident destroyed my windows partition. I found that my video drivers were buggy as hell, my wifi cards needed custom firmware loaded into them, and that trying to get sound to work in Linux is less fun than herding mutant cat mules. I once managed to destroy X because it was a package dependency for a Firefox upgrade and the AMD64 bit version was unstable (or incompatible with something, I never found out for sure since that was the day I gave up using Gentoo). I once spent 3 hours trying to track down and manually compile all of the dependencies for a side scrolling open source game that only provided 30 minutes of entertainment actually playing. My SSH log in was always being harassed by strange Chinese IP addresses and the server was screwed up once from SQL injections on something I wrote.

The point is: when things are working fine in Windows all day, you don’t learn anything. Linux is actually becoming easier to use than ever with distributions like Ubuntu: don’t be afraid to jump into the more “advanced” distributions. I suggest Slackware as a good middle ground between working out of the box and being a good learning experience, then pick up something like Gentoo. Remember: pain is weakness leaving the body. Sometimes the old people complaining you kids have it too easy are actually right. When I was a kid, I remember having to wander the filesystem in a MSDOS command prompt… now get off my lawn (I’m a college graduate, I have the right to say that now, right?)!

Back to the details. What do you do with Linux now that it’s working?

  • Learn to use the command line (+1 for learning shell scripting)
  • Learn Vim (+1 if you become a vim junky and install vim shortcut emulation plugins in your browser)
  • Get SSH working
  • Get a webserver working (http, ftp)
  • Did you pick up HTML yet? Go host your own webpage
  • Get Samba working
  • Play with cron jobs for doing backups and maintenance
  • Try different GUIs (KDE, Gnome, xfce, Fluxbox)
  • Try to recompile your kernel, explore start up scripts, and make it boot as fast as possible
  • Look up guides on how to keep it secure and maybe play with some offensive security (hacking) network tools on your own network

Learning the programming stuff

Alright, you’ve either gotten the basics of computer stuff down or skipped it since you want to get straight to programming that video game idea you’ve been dreaming about. Either way I would recommend that your first programming language is something high level with a decent GUI library built in. Personally I went with Python/Tk followed by TCL/Tk. Other options could be Ruby or PHP if you can find decent tutorials on the internet for them. The reason for not jumping straight into C/C++/Java is that a scripting language will probably let you get to fun stuff sooner. Using Tk you can throw together a little tick tack toe game in about 5 minutes once you know what you’re doing. The second advantage is that this will reduce you’re boredom in beginning CS classes where they will probably start with either Java or C++. The third advantage is that a scripting language is always useful as a go to language for a quick script to do something like text processing. Learn regular expressions, they’re useful even for everyday find/replace tasks on a decent text editor.

How to learn to program

Learning to program is a bit like learning a musical instrument. People can tell you how to do it 500 times and it won’t make you any better. You simply become a better programmer by programming. When you’re learning to use Linux, it generally screws up and you figure out how to fix it. When you move on to programming, you screw up and slowly learn from your mistakes until you see what to do and what not to. Some things might accelerate the learning path: seeing really good code, seeing really bad code, finding the built in library that does the exact thing you’ve spent 3 days programming from scratch, sitting through lectures on software design and algorithms. Overall the best way to learn to program IS to program. I absolutely can not stand reading tutorials on the syntax of languages. Learn the basics of a language and then move on to actually trying to program something. What you ask? What do people draw when they learn to paint? What do people play when they learn music? What kind of building does an architect practice design on? There’s no correct answer; be creative. If you can’t think of anything useful to make, you can always resort to classic games (tic tack toe, checkers, chess, blackack, rubik’s cube, tower defense game, etc etc). The point being it’s easier to study pages of dull documentation when you’re using that to work toward a working program rather than just trying to memorize stuff. Your program might have been programmed a hundred times before, but that won’t make it any less of an experience writing, and you could learn from other people’s implementations.

Keeping the passion and motivation

Everyone has highs and lows when it comes to interests in things. If you’ve been reading from the beginning of this you probably think that I sit around every night writing MMORPGs and hacking together Linux scripts that make my Linux box take input by voice command. The truth is that  my work ethic can best be summarized as spurts of obsessive genius followed by long stretches of laziness, and the vast majority of my nights are filled catching up on all the TV I somehow missed growing up (seen Buffy the Vampire Slayer? If not, go, watch, now). This occurs on both a daily and long term basis, and I don’t believe I’m alone in it. Something I saw in a post by Joel Spolsky resonated with me,

Sometimes I just can’t get anything done.

Sure, I come into the office, putter around, check my email every ten seconds, read the web, even do a few brainless tasks like paying the American Express bill. But getting back into the flow of writing code just doesn’t happen. These bouts of unproductiveness usually last for a day or two. But there have been times in my career as a developer when I went for weeks at a time without being able to get anything done. As they say, I’m not in flow. I’m not in the zone. I’m not anywhere.

For me, just getting started is the only hard thing. An object at rest tends to remain at rest. There’s something incredible heavy in my brain that is extremely hard to get up to speed, but once it’s rolling at full speed, it takes no effort to keep it going.

This is the same with me. Once you get in the middle of a programming project it isn’t a chore, it’s a rush, a feeling of zen that most people rarely find during their day job. However, finding the motivation to get started, especially without school or work deadlines pushing you can be the last thing you feel like doing. I don’t have any solution to this other than: think of something to program and force yourself to start. Sure, it might end up being a false start, I once tried to force myself into the mood by programming an Android time tracking application for people with type A personalities (or just plain OCD, doubt I would ever actually be able to consistently use such and application). It ended in nothing but a failed attempt to find the Ballmer Peak and a slightly more advanced version of the classic Hello World program to see if my SDK was set up.

You might find motivation in odd places too. Why did I take up exploring network security? Anger at someone who told me their page was secure after I said it was outdated and full of holes. I once programmed a puzzle solving game automation program to impress a girl (she was impressed, but in hindsight asking her out would have been a far more effective move). The IRC bot I wrote was partly inspired by another guy writing a bot and battling mine with massive kick/ban/flood wars until the IRC network banned both our IP addresses.  Good times. What you should take away from this is that programming game playing algorithms is not a good way to attract a girl.

If you manage to make it through college as a CS major without having at least one existential crisis, I’ll be amazed. At some point you’ll probably be sitting gloomily in front of your computer realizing that the last x years of your life have been focused around the perusal of degree which will enable you to spend 10x that number of years sitting in front of computers pressing buttons and making the patterns on the screen change. In fact, this was the hardest thing about college for me. It wasn’t the work, I could do the work easily if I applied myself. It was just trying to not fall into apathy, or worse, pure hate for school (keep reading and I’ll got to this).

How did I survive it without becoming a deranged alcoholic? It’s surprising how many software developers and IT people you can find in hole in the wall bars; you should be concerned I know this. There is no easy answer to this. My only advice is to find something to do, even if it isn’t programming. The worse period of my college experience was about 6-12 months ago when I simply had no motivation to touch any CS stuff outside of work and school. The classes I had were both boring and tedium filled, the capstone project I was looking forward to ended up being a huge wast of time, and I’d been both in school and at my internship long enough that absolutely nothing new or exciting was happening in my life. I managed to narrowly avoid failing Statistics for Engineers (pulled a C out of it from 2 days of cramming for the final). The only thing that pulled me out of it was the reality that this is my last semester and I really need to start doing things like planning for a career and not failing my last upper division humanity.

Thoughts on bad professors

An unfortunate fact of my college experience is that I would say the bad professors outnumbered the good professors by probably 2 to 1. For every lecture I went to, I had 2 others with a professor I found mediocre at best.  Is this conclusion because my expectations for professors are too high? Maybe: what makes a good professor?

  • Knowledge of the subject matter above and beyond the textbooks
  • Can be understood (doesn’t skip too many steps, can speak English, can explain things articulately, can write legibly)
  • Entertaining personality/ability to hold people’s attention (throwing in a joke or story instead of 75 minutes of straight monotone slide reading)
  • Fair grading scheme and tests
Once you get to upper division classes, you might be able to start picking classes from professors you know are good. I’d definitely recommend this, nothing is worse than being interested and excited by a topic and then losing all interest because you hate the class so much.

Graduating on time

College advisers always push graduating on time. I went the route of attempting a double major (in math), giving that up, and ended up being behind 1/2 to 1 year depending on how you count things. I don’t regret it at all, if you’re liking college, don’t be afraid to kill some time with interesting non-required classes. It’s easy to get internships when you’re in college that look good on a resume, and in the big scheme of things, who cares if you’re a year older when you enter the work force?

Exception: if you’re hating every moment of college, try and get out quick and not procrastinate. I put off a few classes I didn’t want to take and I would have done better on them earlier on before I lost a lot of my motivation. For those in pain, the quick bandage approach is far better than the slow procrastinating but loathing perma-college student that takes 8 years to finish their degree (I’ve seen it). But hey, 8 years if you’re having the time of your life might be okay, just depends on your situation.

Final thoughts and a link

Somewhere in the middle of college I stumbled across Joel Spolsky’s blog, Joel on Software; I highly recommend reading through his old posts. One of his posts offers some advice for computer science majors.  Go read it yourself, but here are his main points.

  • Learn how to write before graduating.
  • Learn C before graduating.
  • Learn microeconomics before graduating.
  • Don’t blow off non-CS classes just because they’re boring.
  • Take programming-intensive courses.
  • Stop worrying about all the jobs going to India.
  • No matter what you do, get a good summer internship.
A good point to end on: companies like having interns. Programming interviews are hard to properly conduct. Internships let companies have a nice trial period where they get to see what you’re capable of, and you get experience, resume fodder, and money. One problem I had was not knowing when I had enough skills to actually get an internship. Simple answer if you’re an ASU student, CSE310 (Data Structures and Algorithms) and CSE360 (Sofware Engineering/Design) will give you what you need to at least have a good change at surviving a technical interview. That doesn’t mean you couldn’t pick of the stuff you need before the classes or try to interview anyway, but in my experience interviewers focus on algorithmn (reverse a linked list, common algorithm time complexity, space complexity, hash tables, sorting) and OOP design (think through a first pass architecture of a program that does blah and scribble it on a white board) questions a lot, especially for college students that don’t really have much past experience to point to in order to show their skills.