I’d like to start off by saying I haven’t updated the blog in a while. I know that this blog doesn’t have the most readers in the world, but I a) like writing; and b) want to keep the few readers I have
. That being said, expect a couple articles in the near future on big numbers and maybe some cryptography stuff.
On the topic of big numbers, my most recent project is a big number library. It’s slow, incomplete, and uses a whole byte for every digit. However, I have been able to get some use out of it; I wrote a program that calculates a specified fibonacci number using it. The terms that I’ve been finding have been really really big, the smallest one clocking in at a whopping 20,900 digits, and the current biggest having over 160,000 digits! Keeping in mind that the maximum number that an unsigned 64 bit variable can hold is 18446744073709551615, which only has 20 digits, I don’t think it’s all that bad after all.
Some Big Fibonacci Numbers
Friday, 7 May 2010
On Prime Numbers And How To Find Them
Thursday, 8 April 2010
There’s a pretty neat website called http://projecteuler.net which I find pretty fun. It’s got several math problems that are meant to be solved programatically. What’s more is that a lot of them don’t require a profound education in math, just logical thinking.
One task that shows up very often in these problems is determining whether or not a number is prime (divisible only by 1 and itself). The obvious method of doing this would be to loop through 2 to every number below the given number and see if they divide evenly. If there are any numbers that do divide evenly, the number is not prime. Here is an implementation with a few optimizations (only checking odd numbers, and only checking from 3…sqrt(number)):
int isprime(int num) { int i, lim; if(num == 2) /* special case, only even prime number */ return 1; if(num < 2 || num % 2 == 0) /* Watch out for bad numbers */ return 0; lim = (int)sqrt(num); for(i = 3; i < lim; i += 2) /* Only check odd numbers from here on out */ if(num % i == 0) return 0; return 1; }
For testing purposes, our main function will tell us how many prime numbers there are between 0 and argv[1].
int main(int argc, char *argv[]) { unsigned int i, limit, howmany; if(argc < 2) return -1; limit = atoi(argv[1]); howmany = 0; for(i = 0; i < limit; i++) if(isprime(i)) howmany++; printf("%u\n", howmany ); return 0; }
Let’s compile it and see how it works:
massey@marigold:~/blag$ gcc slowprime.c -Wall -Wextra massey@marigold:~/blag$ ./a.out 10 4 massey@marigold:~/blag$ ./a.out 1000 168 massey@marigold:~/blag$ time ./a.out 100000 9592 real 0m1.987s user 0m1.956s sys 0m0.000s
Clearly it works. However, take a look at that last test run where we also look at it’s time. It took nearly 2 whole seconds to count all the primes below 100,000.
Seeing how this speed is unacceptable in most situations, many people have come up with different and faster ways of checking for primality. There was this one guy in ancient Greece, back when knowledge and math flourished, called Eratosthenes. Eratosthenes was a smart guy, and he figured out a much faster method of calculating primality than the one we went over before.
Imagine a long, long list of numbers. The first one is 2, because 1 is a wird number in terms of prime numbers, and the list increases by one from there. What you do, now, is cross out all the numbers that are divisible by the first number (2) on the list. The next number is 3, so you cross out all numbers divisible by 3 on the list. The number after that is 4, but it’s already been crossed out since it’s divisible by 2. If you follow this pattern correctly, what you’ll notice is that every number you come across that has not been crossed out is, in fact, prime.
Take a look at the following list of numbers from 2 to 20.
If you cross out all the numbers divisible by 2, you get:
Once you cross out all the numberse divisible by 3, you get:
And look at that! We’ve got all the numbers below 20 that are prime, and we only had to cross out the numbers divisible by 2 and 3. This method, called the Sieve of Eratosthenes, is much faster than the previous method we discussed.
Here’s a function that will initialize a global array to contain all the primes from 2 to a given number.
unsigned int *sieve; /* Global array */ size_t generate_sieve(int top) { unsigned int i, *temp, size, j, limit; if(top < 2) return 0; limit = top - 1; /* allocate size for temp */ temp = malloc(sizeof(int) * limit); if(temp == NULL) return 0; memset(temp, 0, sizeof(int) * limit); /* initialize temp to contain the numbers from 2..top */ for(i = 0; i < limit; i++) temp[i] = i + 2; /* do the real work */ for(i = 0; i < limit / 2; i++) /* if temp[i] has not already been set to zero, it's prime, and you want to remove all it's multiples */ if(temp[i]) for(j = i ? temp[i] * 2 - 2 : 2; j < limit; j+= temp[i]) temp[j] = 0; /* copy temp into sieve, ignoring all 0's */ size = 1; /* call malloc() here so we only have to deal with realloc() inside the loop */ sieve = malloc(sizeof(int) * size); sieve[0] = 2; for(i = 1, j = 1; i < limit; i++) if(temp[i]) { sieve = realloc(sieve, ++size * sizeof(int)); sieve[j] = temp[i]; j++; } free(temp); return size; }
It is by no means the best implementation of the sieve of eratosthenes, but it works just fine. To compare it to the previous code, let’s rewrite our main function to accomodate this one.
int main(int argc, char *argv[]) { unsigned int size; if(argc < 2) return -1; size = generate_sieve(atoi(argv[1])); printf("%u\n", size); if(sieve) free(sieve); return 0; }
Let’s fire it up and see what happens.
massey@marigold:~/blag$ gcc fastprime.c -Wall -Wextra massey@marigold:~/blag$ ./a.out 10 4 massey@marigold:~/blag$ ./a.out 1000 168 massey@marigold:~/blag$ time ./a.out 100000 9592 real 0m0.014s user 0m0.012s sys 0m0.000s
Not only does it work, but it took 0.014 seconds to tell me how many primes there are under 100,000. Compared to our original code’s near 2 second runtime, I’d say this guy Eratosthenes knew what he was talking about.
.
How Buffer Overflows Work – From a Metaphorical Point of View
Thursday, 1 April 2010
Imagine you are a retired lottery winner, rich beyond your wildest dreams. You’re relaxing on the beaches of Jamaica, sipping an ice-cold drink. Your very own servant comes up to you and asks you what to do. You’re about to tell him to refill his drink, but then you remember he’s not the brightest servant on the island. You remember you have to tell him every last thing to do or he’ll forget. So, you give him the drink, tell him where to get water, and most importantly, tell him where to return when he’s done – where you are right now. This is, in fact, so important that you had to write it down.
Your loyal servant wanders over to where you told him to get water and faithfully fills up your cup. Except… You forgot to told him to stop filling up your cup when it was full. Oh no! Passers by stare in confusion as your humble servant continues to fill your full cup with water that is overflowing. After what seems like an eternity, but was probably only five or ten minutes, the water runs out and he stops pouring it into the cup. He knows he must return now, but he has forgotten where. Thankfully, you wrote it down.
Your servant walks confidently over to west beach, just like you specified on that piece of paper. When he gets there, he finds the spot where you were sitting and asks for instructions. But wait! That’s not you! But how can this be? You have not moved since you told him to fill up your cup. You see, it turns out the water that your loyal and faithful servant would not stop pouring into your cup spilled onto the piece of paper that you used to write down your location. What should have been “best beach” turned into “west beach”, and now your servant is asking other people for instructions.
This is very much like what can happen to computers. When a function is called, the computer has to tell it where to go when it’s done. However, certain data can spill into the specified location, the function will return to a different address, most of the time it’s the address of malicious code that will grant an attacker whatever they may please. In your case, the people over at best beach could have told your loyal and humble servant to steal your money and give it to them. And that’s how a stack overflow works.
Moral of the story? Always tell your servants how much water you want.
A Beginner’s Guide To Debugging With GDB
Wednesday, 31 March 2010
Debugging is an essential skill for any programmer. It alows you to step through each line of code and examine variables, return values, registers and more. On linux, gdb is the debugger most commonly used. It’s simple, straightforward, and easy to use (for the most part). Anways, let’s get started…
For the purpose of this tutorial, we’re going to write an implementation of the Euclidean algorithm, which is used to find the greatest common divisor of two numbers.
http://en.wikipedia.org/wiki/Euclidean_algorithm. Here is the code:
//euclidean.c #include <stdio.h> #include <stdlib.h> /* *incorrectly* finds the gcd of numbers a and be using the euclidean algorithm */ unsigned int gcd(unsigned int a, unsigned int b) { unsigned int r; while(r = a % b) { a = r; b = a; } return b; } int main(int argc, char *argv[]) { if(argc < 3) return -1; printf("%u\n", gcd(atoi(argv[1]), atoi(argv[2]))); return 0; }
If you read the comments (which I hope you did), you’ll know that our gcd() function is incorrect. Is it really? Yes, try running it a couple times and look at your results:
massey@epsilon:~$ gcc euclidean.c -o euclidean massey@epsilon:~$ ./euclidean 5 5 5 // right massey@epsilon:~$ ./euclidean 1 2 1 // right massey@epsilon:~$ ./euclidean 5 6 5 // um.... massey@epsilon:~$ ./euclidean 5 7 5 // okay... massey@epsilon:~$ ./euclidean 11 3 2 // what?
As you can see, it starts off right by telling us that the gcd of five and five is five, and the gcd of one and two is one. However, it then tells us that the gcd of five and six is five and the gcd of five and seven is also five; Both of which aren’t true, as five doesn’t divide evenly into six or seven. Then, it goes completely off-track, and tells us that the gcd of eleven and three is two. That’s right, two, with eleven and three both being odd numbers. Weird… Let’s open it up in gdb and take a look at what’s happening.
massey@epsilon:~$ gcc euclidean.c -o euclidean -ggdb3 massey@epsilon:~$ gdb euclidean GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu"... (gdb) break gcd Breakpoint 1 at 0x80483dc: file euclidean.c, line 6. (gdb) run 5 6 Starting program: /home/massey/euclidean 5 6 Breakpoint 1, gcd (a=5, b=6) at euclidean.c:6 6 a = r; (gdb)
There’s a lot going on here – let me explain.
On the first line, you’ll see that we compile euclidean.c with the -ggdb3 switch. This tells gcc to associate the executable with the source file, so that when the program is being examined with gdb we can take a look at the line of code we’re executing, instead of pure machine code. The second line is pretty straightforward, we open up gdb passing it the name of our executable as the sole argument. After that there’s a lengthy copyright announcement (which you can disable with the -q switch).
On the line after the copyright info, we type break gcd. This command sets a breakpoint on the gcd function. A breakpoint is a specified part of the program where the debugger will stop execution and allow you to examine whatever you like. This will be useful for our situation, because it is important to know what variable holds what number during each iteration of the loop, and a breakpoint lets us see these things.
The next command, run 5 6, starts the program, passing it the parameters five and six. Soon after the program’s run, we hit our breakpoint.
(gdb) p a $1 = 5 (gdb) p b $2 = 6 (gdb) p r $3 = 5 (gdb) p a % b $4 = 5
The p command lets us see what value a variable contains. We see that a is equal to five, and b is equal to six. We also see that r is equal to five, which is because, as you can see on the next line, five mod six equals five. Continuing on:
(gdb) step
7 b = a;
(gdb) step
5 while(r = a % b) {
(gdb) p b
$7 = 5
(gdb) p a
$6 = 5
(gdb) step
9 return b; The step command tells the debugger to continue execution, but only until the next line, b = a;. We step one more time, and we’re back at the condition. Using the p command, we see that both b and a are five, and since five mod five is zero, the condition is false. We step one more time and we’re at the return statement, returning five.
But wait, that’s not how the Euclidean algorithm works. It’s supposed to calculate the remainder of the second number in the first equation (that’s a mouthful, but I think it makes sense) and the remainder of the first equation. What’s happening here is the first number of the first equation is being divided by the remainder of the first equation. D’oh!
All it takes to fix this mistake is changing two lines inside the while-loop! Can you guess which two?
while(r = a % b) { a = b; // was a = r; b = r; // was b = a; }
That’s it! Re-compile the program and run it again. It should work perfectly.
massey@epsilon:~$ nano euclidean.c massey@epsilon:~$ gcc euclidean.c -o euclidean massey@epsilon:~$ ./euclidean 5 6 1 massey@epsilon:~$ ./euclidean 11 3 1 massey@epsilon:~$ ./euclidean 5 10 5
Works perfectly.
.