You all know url shortener services like
goo.gl, etc.. Their use is simple : You have a long, nasty url, and you want a smaller one. That's exactly what they do ! As I was using them, I asked myself how I could do that.. And here I am, sharing my experience with you.
This project will make use of the following principles :
- Hashing function
What the hash ?!
Our goal, is to go generate a small URL, made up of characters. In my case, I chose the 26 letters of the alphabet and numbers from 0 to 9. It means that even with a URL composed of only 6 characters, that could repeat themselves, we could have 36^6 = 2 176 782 336 unique url ! Which is more than enough for this project.
Ok, so far so good ! But how do we do that exactly ? How do you shrunk your awfully long url to a small cute one ? Well.. you hash it ! Wikipedia tells us that : “A hash function is any function that can be used to map data of arbitrary size to data of a fixed size.” That's great, it's seems like the right thing for us !
There are tons of hash functions out there, but one caught my eye. It's called MurmurHash and give pretty good performance. That's important because you don't want something too computationally intensive. Be careful tho.. This is not a cryptographic hashing function ! Meaning that it is not designed to be hard to reverse hash.
All right but, if we don't want to reverse hash, how are we going to retrieve the long URL you may ask ? Simple ! We store it ! Using JSON we can store the corresponding, hashed, small URL to the long one, entered by the client. When the client use the small one, we search the corresponding original url, and redirect the client to the desired page. Sounds easy, right ? Well, it is !
Ok, so we got the idea, the algorithm to do it, and the “structure” of our service. Now let's talk implementation. For that, I'm using NodeJS. You could actually use anything that can act as a webserver, php, c, c++, java, python, etc.. I decided to use NodeJS because I wanted to learn it. What better way to learn than to use something ?
I won't describe every step of the process. If you go to my Github, the code is pretty self-explanatory and well commented. If you have any question, feel free to ask !
As of now, the service is pretty limited. There are no ads, no history for each users, no special search algorithm to retrieve urls faster. Here are some ideas to take it further :
- Make an API available for everyone to use.
- Make an web browser extension that automatically reduces long urls. (Could be coupled with the API)
- Add user specific history (like goo.gl)
- Implement a faster way of searching through urls