2010 – In Review

2010 has been a good year for me. Busy, but good.

The most important event this year was the birth of my daughter. She is the joy of my life – with my wife coming in a very very close second (sshh, don’t tell her that!). That little girl just loves to smile!

I seem to have come in possession of a house as well. It required a bit of work.

With all of this happening in my life, I didn’t do near as much blogging as I probably ought to have. I did manage do some blogging work, migrating from Blogger to WordPress, although it was forced on me. Most of my ‘blogging’ seems to have occurred on twitter this last year, but I did manage to embed twitter into my blog.

I did manage to get a few projects done. However, not the rough draft of the book on aquaponics I want to write – the website is coming along though.

I did manage to give some presentations this year. I even somehow found time to go camping once!.

Finishing the Insulation

It’s nice having awesome friends, between Jerry helping with the prep work, and Jeb helping feed the blower, I got the job done!

I had gotten sick between the time I started this project, and the time I could finish it, so it took me a bit longer than I had anticipated.

I finally rented a blower from Menards.

The Blower
The Blower

Between it not being bolted down to the pallet they had it on, and it’s missing parts, It was a frustrating experience getting it home and set up.

Duct tape, the universal adapter!
Duct tape, the universal adapter!

I got the privilege of hanging out in my attic for 5 hours!

the blower hose in my attic

The job wasn’t near as bad as I thought it was going to be.  The prep-work was harder than the actual blowing in of insulation. That isn’t to say, that the job was a cake walk.

A wee bit dusty
A wee bit dusty
We got a bit dirty blowing in the insulation!
We got a bit dirty blowing in the insulation!

In the end, it was worth it! The gas bill was something like half as much for the month after we insulated from the month before, and it was much colder. I will have to run the numbers later, but I am expecting that the insulation will have paid for itself in energy savings the first winter!

All done!
All done!

I still have a bit of insulation to add. I came up a bit short, so I must have blow in insulation a bit thick in places. I am going to need to get a few more bags and toss the insulation in by hand. I made sure that the space left to do was close to the access hatch so it’s easy for me to do.

Not quite enough insulation - I tried to cover the small area left with some, but it got a bit thin here.
Not quite enough insulation - I tried to cover the small area left with some, but it got a bit thin here.

I also have to go around and insulate the rimjoists and find any drafts and close those up as well..

Bella helping daddy fix stuff

Bella and I where fixing an electric heater tonight.

Disasembled electric space heater
Disasembled electric space heater

It was being tossed, so it was tossed in my direction because I guess I am known for collecting junk. Anyway, I’ve been wanting one for Bella’s room on them really cold nights. We’ve been kicking up the thermostat a degree or two and it seems to do the trick, but she’s about as far away from the furnace as you can get in my house.

She was so good watching me take the heater apart. I would show her a tool and tell her what I was doing. She seemed as genuinely interested as any 7 month old could about anything. I think she just liked all the ‘toys’ and movement. The fan just needed some oil on the fan motor bushings, so it took just a few minutes to fix.

Before putting it back together, I washed the only part big enough that she couldn’t swallow it. She was quite happy with the toy! She determined that the fan failed initially because it wasn’t properly taste-tested. She can now claim to have helped fix the space heater.

Bella doing a little Quality Control. The electric heater will now work correctly as it has been properly taste tested.
Bella doing a little Quality Control. The electric heater will now work correctly as it has been properly taste tested.

Last night the two of us tried to install a remote control for the fan in my room that was given to me. Mamma was cooking, so I was watching Bella. I was a bit concerned with how I had this rigged up for testing, but she was pretty interested in her bottle and the mirror I had set her beside by this point in the testing. She tolerated this pretty well. Never did get the fan control working.

Trying to hook up a remote unit to a ceiling fan.
Trying to hook up a remote unit to a ceiling fan.

Last week, the two of us where working on the treadmill I picked up from freecycle. She wasn’t so much loving this. She only lasted about 20 minutes. To her credit, it was kinda chilly in the basement and her bottle was empty. The speed control was burnt out. $60 for a new controller board. Eventually I will save up to get it. Hopefully the motor isn’t shorted or something to cause the controller board to fail like this. $60 for a treadmill in good shape is a great deal.

Blown transister on bottom left and blown capaciter in center 1/3rd from left.
Blown transister on bottom left and blown capaciter in center 1/3rd from left.

I am hoping that I can get Bella interested in fixing stuff. The hobby is so important to me, it would be so much fun to have somebody to wrench on the cars with. The boys will think it’s pretty cool too when she gets old enough for that to matter to her..

Tuning the training rate of a NeuralMesh ANN

I’ve been running a NeuralMesh Artificial Neural Network for about a week now. The point of it is for SPAM detection. I am feeding it 7 different  spam scores from the b8 baysian library.

My 7 ‘aspects’ of email spam are:

  • to
  • from
  • full from
  • subject
  • full subject
  • header
  • body

I am using the To field as sometimes spam is sent to funny names, and sometimes real email is sent to a particular person. The From figures on both the email address and name it comes from – luckily some spammers put their product names in the from.  The Full From processes the from field on a whole, this should be good for duplicate messages – when I’ve seen it once before, I have a good idea for what it is. The subject and full subject operate on the same principle, however sometimes a random word is put into a subject, so the full subject isn’t necessary a solid repeat indicator. The header is designed to look for SPF records and IP addresses in addition to including the above bits. The body, well, that just the whole message.

I feed all these different scores into a 7 input, 5 neuron, 1 output ANN to let it sort out what’s meaningful and what’s not. Every time somebody deletes a message as spam or accepts an email as not spam, I use that as a training trigger for both the b8 Bayesian library and the NeuralMesh ANN library. Because my spam to not spam ration is so out of whack, the training loop doesn’t need many iterations for spam, but needs a lot for not spam.

To address this, I am using an adaptive training iteration system.  The closer to a pure spam score the system has, the fewer training iterations, the further from a pure spam score, the more iterations I do.

$learning_iterations = round((1 – $finalscore[‘AI Score’]) * 100);

This seems to be working well, judging by the table below of the last however many training sessions the ANN has underwent.  The fewer the number of Iterations, the higher the spam score, which the system has seen a lot of. Looking at the Start MSE compared to the end MSE, any training under 4 iterations doesn’t really change anything.  The higher number of iterations events are not spam, so the system is starting with a very high Start MSE, and the extra training iterations is bringing the End MSE right down.

You can see after every non-spam training session, it takes a couple of data sets before it settles back down to it’s usual boring 3s&4s for training. The data corpus is still pretty small, only a few non-spam data points that have been trained into the system so far.

I think I am going to skip all training under 3 iterations, as it seems to not really make any difference. It probably just loads the system up with extra data points, and will eventually slow it down as I accumulate more data. I loose a lot of the little finesse trainings, but I think having fewer meaningful trainings as the corpus drifts will be more resource friendly.

Iterations Start MSE End MSE Date Exec Time Off-line
66 0.978118000 0.000000000 16/12/2010 9:33:15 am 0.01188200 n
3 0.000087000 0.000087000 16/12/2010 8:45:08 am 0.00063600 n
3 0.000108000 0.000108000 16/12/2010 8:44:43 am 0.00059300 n
3 0.000098000 0.000098000 16/12/2010 8:44:41 am 0.00056200 n
3 0.000083000 0.000083000 16/12/2010 8:44:40 am 0.00061200 n
3 0.000107000 0.000107000 16/12/2010 8:44:39 am 0.00062100 n
4 0.000100000 0.000100000 16/12/2010 8:44:37 am 0.00078800 n
3 0.000078000 0.000078000 16/12/2010 8:44:36 am 0.00061500 n
3 0.000077000 0.000077000 16/12/2010 8:44:35 am 0.00069400 n
4 0.000113000 0.000113000 16/12/2010 8:44:33 am 0.00076700 n
3 0.000114000 0.000114000 16/12/2010 8:44:31 am 0.00060800 n
3 0.000069000 0.000068000 16/12/2010 8:44:30 am 0.00061600 n
4 0.000105000 0.000104000 16/12/2010 8:44:09 am 0.00075900 n
4 0.000101000 0.000101000 16/12/2010 8:44:08 am 0.00075800 n
4 0.000118000 0.000118000 16/12/2010 8:44:06 am 0.00075200 n
3 0.000119000 0.000119000 16/12/2010 8:44:04 am 0.00062000 n
4 0.000066000 0.000065000 16/12/2010 8:44:04 am 0.00086800 n
3 0.000119000 0.000119000 16/12/2010 8:44:01 am 0.00061100 n
3 0.000120000 0.000120000 16/12/2010 8:44:01 am 0.00063000 n
4 0.000123000 0.000123000 16/12/2010 8:43:58 am 0.00086400 n
4 0.000112000 0.000112000 16/12/2010 8:43:30 am 0.00080700 n
3 0.000113000 0.000113000 16/12/2010 8:43:29 am 0.00058800 n
4 0.000114000 0.000114000 16/12/2010 8:43:28 am 0.00076300 n
3 0.000128000 0.000128000 16/12/2010 8:43:27 am 0.00060900 n
4 0.000129000 0.000129000 16/12/2010 8:43:26 am 0.00088600 n
3 0.000105000 0.000105000 16/12/2010 8:43:24 am 0.00058300 n
4 0.000131000 0.000130000 16/12/2010 8:43:23 am 0.00076200 n
4 0.000082000 0.000082000 16/12/2010 8:43:21 am 0.00075600 n
3 0.000134000 0.000134000 16/12/2010 8:43:20 am 0.00061800 n
4 0.000122000 0.000122000 16/12/2010 8:43:04 am 0.00075100 n
4 0.000137000 0.000136000 16/12/2010 8:43:02 am 0.00079800 n
4 0.000139000 0.000138000 16/12/2010 8:43:02 am 0.00086800 n
4 0.000140000 0.000139000 16/12/2010 8:43:00 am 0.00073600 n
4 0.000142000 0.000141000 16/12/2010 8:42:59 am 0.00075300 n
4 0.000128000 0.000128000 16/12/2010 8:42:59 am 0.00076200 n
4 0.000099000 0.000098000 16/12/2010 8:42:57 am 0.00075400 n
4 0.000114000 0.000113000 16/12/2010 8:42:17 am 0.00084600 n
4 0.000147000 0.000146000 16/12/2010 8:39:23 am 0.00076900 n
4 0.000149000 0.000148000 16/12/2010 8:39:20 am 0.00079300 n
4 0.000136000 0.000135000 16/12/2010 8:39:18 am 0.00081300 n
4 0.000138000 0.000137000 16/12/2010 8:39:15 am 0.00075000 n
4 0.000158000 0.000157000 16/12/2010 8:39:13 am 0.00079700 n
4 0.000155000 0.000154000 16/12/2010 8:39:13 am 0.00074500 n
8 0.000291000 0.000160000 16/12/2010 8:39:12 am 0.00150400 n
19 0.000667000 0.000524000 16/12/2010 8:37:45 am 0.00386300 n
21 0.002455000 0.000771000 16/12/2010 8:37:43 am 0.00371700 n
84 0.075070000 0.000000000 16/12/2010 8:35:36 am 0.01497400 n
5 0.000146000 0.000144000 16/12/2010 8:34:20 am 0.00097200 n
5 0.000148000 0.000147000 16/12/2010 8:34:18 am 0.00094400 n
5 0.000151000 0.000149000 16/12/2010 8:34:17 am 0.00095600 n
5 0.000105000 0.000104000 16/12/2010 8:34:14 am 0.00096000 n
9 0.000123000 0.000121000 16/12/2010 8:34:03 am 0.00164900 n
9 0.000126000 0.000124000 16/12/2010 8:30:34 am 0.00171600 n
6 0.000129000 0.000127000 16/12/2010 8:30:13 am 0.00114000 n
5 0.000159000 0.000157000 16/12/2010 8:30:12 am 0.00092800 n
6 0.000169000 0.000167000 16/12/2010 8:30:11 am 0.00115800 n
6 0.000172000 0.000169000 16/12/2010 8:29:32 am 0.00231000 n
6 0.000150000 0.000148000 16/12/2010 8:29:29 am 0.00116100 n
6 0.000192000 0.000189000 16/12/2010 8:29:27 am 0.00110500 n
6 0.000105000 0.000104000 16/12/2010 8:29:17 am 0.00114200 n
8 0.000183000 0.000178000 16/12/2010 8:29:02 am 0.00164100 n
6 0.000195000 0.000191000 16/12/2010 8:28:14 am 0.00116900 n
5 0.000218000 0.000215000 16/12/2010 8:26:38 am 0.00098300 n
6 0.000200000 0.000197000 16/12/2010 8:26:13 am 0.00114300 n
10 0.000167000 0.000161000 16/12/2010 8:25:59 am 0.00187000 n
6 0.000229000 0.000216000 16/12/2010 8:22:49 am 0.00110500 n
2 0.004528000 0.004528000 16/12/2010 8:22:44 am 0.00034600 n
7 0.000176000 0.000172000 16/12/2010 8:22:41 am 0.00128300 n
8 0.000269000 0.000259000 16/12/2010 8:19:45 am 0.00146800 n
11 0.000334000 0.000311000 15/12/2010 5:19:48 pm 0.00199000 n
10 0.000340000 0.000319000 15/12/2010 4:10:16 pm 0.00186500 n
12 0.000869000 0.000527000 15/12/2010 2:47:10 pm 0.00221200 n
8 0.000471000 0.000441000 15/12/2010 2:46:57 pm 0.00165100 n
12 0.000552000 0.000488000 15/12/2010 1:14:04 pm 0.00221000 n
68 0.008667000 0.000560000 15/12/2010 1:12:59 pm 0.01240600 n
30 0.976014000 0.000002000 15/12/2010 1:12:19 pm 0.00550100 n
4 0.000014000 0.000014000 15/12/2010 1:11:02 pm 0.00086600 n
4 0.000371000 0.000366000 15/12/2010 11:13:43 am 0.00089800 n
4 0.000013000 0.000013000 15/12/2010 9:32:22 am 0.00078800 n
4 0.000022000 0.000022000 15/12/2010 9:32:10 am 0.00073700 n
4 0.000247000 0.000243000 15/12/2010 9:32:09 am 0.00079800 n
3 0.000052000 0.000052000 15/12/2010 9:32:08 am 0.00057900 n
4 0.000034000 0.000034000 15/12/2010 9:32:07 am 0.00079400 n
3 0.000154000 0.000153000 15/12/2010 9:32:06 am 0.00068800 n
4 0.000019000 0.000019000 15/12/2010 9:32:04 am 0.00081700 n
4 0.000017000 0.000017000 15/12/2010 9:32:03 am 0.00090600 n
4 0.000014000 0.000014000 15/12/2010 9:32:01 am 0.00079600 n
4 0.000045000 0.000045000 15/12/2010 9:31:59 am 0.00079000 n
4 0.000032000 0.000032000 15/12/2010 9:31:58 am 0.00074200 n
4 0.000019000 0.000019000 15/12/2010 9:31:57 am 0.00076000 n
4 0.000262000 0.000258000 15/12/2010 9:31:33 am 0.00082000 n
4 0.000094000 0.000093000 15/12/2010 9:30:55 am 0.00073200 n
3 0.000051000 0.000051000 15/12/2010 9:30:54 am 0.00061300 n
4 0.000014000 0.000014000 15/12/2010 9:30:53 am 0.00078600 n
4 0.000014000 0.000014000 15/12/2010 9:30:51 am 0.00077100 n
4 0.000282000 0.000278000 15/12/2010 9:30:49 am 0.00081600 n
4 0.000014000 0.000014000 15/12/2010 9:30:48 am 0.00074700 n
4 0.000014000 0.000014000 15/12/2010 9:30:47 am 0.00074000 n
4 0.000014000 0.000014000 15/12/2010 9:30:46 am 0.00080700 n
4 0.000020000 0.000020000 15/12/2010 9:30:45 am 0.00084900 n

I am probably going about doing this all wrong, but so far it is seaming to work. Let me know if there is a better way..

Artificial Neural Networks in PHP using NeuralMesh with Bayesian Inferencing using b8

I’ve been interested in Fuzzy Logic and Artificial Neural Networks (ANN) for some time now. I guess it’s the whole binary doesn’t describe our world full of shades of gray very well thing that caught my attention.

I’ve owned a book on the topic for a while now. C++ Neural Networks and Fuzzy Logic by Valluru B. Rao Well, it’s been a real long time as the examples in the book are provided on a floppy disk.

I’ve not had much chance to apply the concepts for a few reasons. Most websites don’t need AI capabilities. There haven’t any ‘easy’ PHP libraries to use. I briefly looked into porting the C++ code provided in the book over, but I am too lazy and my math skills aren’t good enough to make sure I did it right. Also, I couldn’t figure out how to take the ‘stuff’ I dealt with every day and boil it down to numerical values that an ANN can process.

I recently stumbled across a new 100% PHP based Neural Network library called NeuralMesh. http://neuralmesh.com/ http://sourceforge.net/projects/neuralmesh/

My ‘Ah-ha’ moment was when I realized that I could use the Bayesian Statistical Inferencing stuff I have been using for SPAM filtering email as the numerical input into a Neural Net.

Now that I have a tool that can take a block of text and turn it into a number, I can play with Neural nets. After all, I’ve been creating a ‘SPAM score’ for email messages for about a year now, so I got the numericalization of text down pretty well.

I have been having trouble with my spam filter lately. I’ve relied on it too much. I’ve categorized over 13,000 email messages. When a spammer changes tactics and tries something new, it can take dozens of marking the new spam messages spam before the Bayesian library can overcome all the inertia and poisoning before it’s recognized as spam.

I need to change tactics. The Bayesian library I am using, b8, is optimized for very short messages, like blog post comment spam, and not potentially large email messages. My new tact will be to feed b8 what it’s designed for, smaller chunks of text. I will give it the email subject line both in pieces and in whole. The whole subject scoring will allow it to quickly categorize messages it has seen before – fast squashing of duplicate messages. The subject pieces scoring will allow it to quickly pick up on the old ‘random bit of text’ in an otherwise rehashed subject trick. A few bits and pieces of the email header like the IP address, user agent, spam-cop score, etc. will all be picked out and fed into b8 for individual scoring as well.

This is where NeuralMesh comes in. I can feed all these different spam scores into the Neural Network as separate inputs and let the ANN figure out what scores are meaningful and what combinations are not. There is always a human that decides if any given email is truly a good email or if it is spam, so I have a built in feedback loop. This should allow the ANN to quickly learn all it needs to know about spam.

Unfortunately, I can’t just jump right in and expect good results. The first thing I have to do is create the multiple scores and let the Bayesian Inferencing get ‘learned up’ on what’s spam and what’s not. I will continue to rely on the current, overworked, filter while the new set is watching over it’s shoulder and learning.

If I don’t do this, I will be feeding the ANN un-educated guesses by the new Naive Bayesian filters which is basically gibberish the first few ANN learning sessions. At my email usage rate, after a week or two b8 will have things pretty much figured out and I can start training the ANN with meaningful numbers.

One of the things I will be doing with the ANN is adaptive training. Each time a human makes or confirms a spam/not spam decision, I feed that back into the ANN for further training.  When you present training data to a ANN, you tell it to train using X number of iterations until it has figured things out. Being that my Spam to not spam ration is so out of proportion,  it is very heavily biased towards spam.  So, the less spamy a message is, the more iterations I allow the spam filter to train on. Conversely, the more spammy it is, the fewer training iterations a message gets – to the point that if it’s 99.5% or more  likely to be spam, I don’t train the ANN at all. It’s just telling it what it already knows.

Now, this process doesn’t apply to just spam. By training b8 different categories for a given text string, it can do a lot of different categorizations for me.

In practice, it appears to be working as I have planned. Time will tell if this is a good approach or not..