My first golang project, an url obfuscator
Recently, a family member’s friend asked me to send him content I had on a server of mine. This server has no hostname and IPs are often censored by tools like Facebook Messenger. I then searched for an URL shortener, minifier, obfuscator or whatever… Many exists, but how is my data safe? This is always one of my primary concerns. Because none of the tools didn’t had any data analysis behind it, I searched for some open source software I could run myself. Here again, they are some good software around, but none that had all my criterias.
# Research
Giants apart (goo.gl - closed, bitly - too much tracking, ow.ly - same it makes links shortening a business), the best I could find (in my opinion) is TightURL. It is Open Source and runs over MySQL and PHP. I think spammers got to him though as it’s not possible to shorten links anymore. According to the code it uses a traditional base62 conversion of identifiers although most of the code is actually to handle spam and blacklist stuff! By the way it made me discover uribl, subl and bad behavior which all look great to help you counter spam if that’s your objective. Back to the topic, I looked further and found TinyURL (really wonder how this is financed though) but it’s not open source. The last I found was lilUrl, meant to be an open source clone of TinyURL. The algorithm here is a non-standard numeric to alphabet conversion. I didn’t look deep but on first sight it looks like it wouldn’t take you a while to create a rainbow table and reverse engineer it. Another quick look at Github and you’ll find dozens of implementations, many using PHP and SQL-based infrastructure. One caught my eye and said “Smallest URL shortener in Go”. This repository as a really simple implementation, and the documentation said:
How easy it is to get up and running in Go. It took me about 1 hour from start to finish. Writing this README file took longer time.
I told myself that after all, building a URL shortener is a good exercise. On top of looking at learning Golang, I started investigating further how these shortener are engineered to build one myself.
# Algorithms
I stumbled uppon this great article about How to build a Tiny URL service that scales to billions?. I liked the Algorithm part about the hashing of identifiers and the scale estimations. What I noted from this and other readings is that these services usualy:
- get the next database auto-incremented identifier, a number
- transform this number into base62 from base10, probably adding some salt for pseudo-randomness
An open source software named hashids does this in multiple languages. If you look at the documentation, you’ll get that it is not an encryption algorithm and you can read this really interesting cryptanalysis of hashids to find more about this subject.
Other alternatives are used to build identifiers, for example:
- using Knuth’s integer hash method see optimus-go
- with a Universally Unique Identifier (UUID) standardized by the IETF in rfc 4122. This one is commonly used in computer engineering for lots of things but is long.
- nano-id, collision’s probability comparable to UUIDs, cryptographicaly safe and ported to lots of languages. They also have a collision calculator that’s pretty fun to use.
- or look at how instagram generates identifiers
# Storage
After taking a step back on everything, I definitely didn’t care about hashing an integer into a string but was more into having an identifier built from a given alphabet. I decided to choose nano-id. Indeed, my idea was to use a robust and scalable Key-value database to store my links. I tried here to be as minimalist as possible. I know some stuff about K/V database as I worked with Redis and Leveldb. I wanted something even more scalable, reminding me of the scale estimations of the blog post I read earlier. After a quick search I found BoltDB, a Key-value database in go. It got forked to etcd, that is built for distributed Cloud computing. It’s even used by Kubernetes internals. I thought why not, even if it sounds a bit too much at least I’m sure that, in theory, this scales.
# Let’s code
I tried Rust but never Go. I have taken a peak to golang’s basics while reading other’s code and the start of go’s tour. I’m no expert on this technology and I was under the impression that I wouldn’t like the synthax. I was wrong, and I actually loved it.
My URL obfuscator service will work like this (service.com
is the hostname of the service, example.com
the URL to be obfuscated):
- go to
service.com?example.com
- store
hash
=>example.com
- give back the
service.com/hash
URL to the user which when hit will redirect with a 301 onexample.com
The http handler to create a link goes as follow:
/// GET ?http://link creates the link and redirect to the link
func CreateLink(env *Env, w http.ResponseWriter, r *http.Request, inputUrl string) error {
if utf8.RuneCountInString(inputUrl) > 2000 {
return makeStatusError(http.StatusRequestURITooLong)
}
parsedUrl, err := url.Parse(inputUrl)
if err != nil {
return StatusError{http.StatusBadRequest, err}
}
if parsedUrl.Scheme == "" {
parsedUrl.Scheme = "https"
}
id, err := gonanoid.Generate(env.Config.IdAlphabet, env.Config.IdLength)
if err != nil {
return StatusError{http.StatusInternalServerError, err}
}
err = env.Transport.Put(id, parsedUrl.String())
if err != nil {
return StatusError{http.StatusInternalServerError, err}
}
http.Redirect(w, r, fmt.Sprintf("%s/%s", env.Config.ShortenerHostname, id), 302)
return nil
}
For the last step, a friend had a good idea to keep things really minimalist: do the redirection without actually redirecting the first time, the user can copy the URL from the adress bar directly. For this to work, I’ll set up a Cookie that will be deleted with the next request:
- go to
service.com?example.com
- create cookie
service
- redirect to
service.com/hash
- check cookie presence:
- yes: do not redirect (301) but give the ability to copy the URL
- no: do the redirection
Here is the adress retrieval code:
/// Single endpoint /
/// When there's a query we're using it's value to save the link in the database
/// If there's a code (eg: hostname.com/EnYQkRXzK30d) we redirect to the given value
func GetIndex(env *Env, w http.ResponseWriter, r *http.Request) error {
url := strings.Replace(r.URL.RawQuery, "?", "", 1)
if url != "" {
cookie := &http.Cookie{Name: cookieName, SameSite: http.SameSiteStrictMode, Secure: true, HttpOnly: true}
http.SetCookie(w, cookie)
return CreateLink(env, w, r, url)
}
key := strings.Replace(r.URL.Path, "/", "", 1)
if key != "" {
_, err := r.Cookie(cookieName)
if err == nil {
cookie := &http.Cookie{Name: cookieName, MaxAge: -1, SameSite: http.SameSiteStrictMode, Secure: true, HttpOnly: true}
http.SetCookie(w, cookie)
w.WriteHeader(http.StatusCreated)
w.Write([]byte(http.StatusText(http.StatusCreated)))
return nil
}
return Redirect(env, w, r, key)
}
return makeStatusError(http.StatusNotFound)
}
Note that there is a Transport abstraction, by defaults it uses BoltDB and I soon added a Redis transport! Check it by visiting the code on github: https://github.com/soyuka/caligo.
An online version of this project is available at caligo.space, to get a hash copy this in your address bar: caligo.space?example.com