2010 – In Review

2010 has been a good year for me. Busy, but good.

The most important event this year was the birth of my daughter. She is the joy of my life – with my wife coming in a very very close second (sshh, don’t tell her that!). That little girl just loves to smile!

I seem to have come in possession of a house as well. It required a bit of work.

With all of this happening in my life, I didn’t do near as much blogging as I probably ought to have. I did manage do some blogging work, migrating from Blogger to WordPress, although it was forced on me. Most of my ‘blogging’ seems to have occurred on twitter this last year, but I did manage to embed twitter into my blog.

I did manage to get a few projects done. However, not the rough draft of the book on aquaponics I want to write – the website is coming along though.

I did manage to give some presentations this year. I even somehow found time to go camping once!

Finishing the Insulation

It’s nice having awesome friends, between Jerry helping with the prep work, and Jeb helping feed the blower, I got the job done!

I had gotten sick between the time I started this project, and the time I could finish it, so it took me a bit longer than I had anticipated.

I finally rented a blower from Menards.

The Blower
The Blower

Between it not being bolted down to the pallet they had it on, and it’s missing parts, It was a frustrating experience getting it home and set up.

Duct tape, the universal adapter!
Duct tape, the universal adapter!

I got the privilege of hanging out in my attic for 5 hours!

the blower hose in my attic

The job wasn’t near as bad as I thought it was going to be.  The prep-work was harder than the actual blowing in of insulation. That isn’t to say, that the job was a cake walk.

A wee bit dusty
A wee bit dusty
We got a bit dirty blowing in the insulation!
We got a bit dirty blowing in the insulation!

In the end, it was worth it! The gas bill was something like half as much for the month after we insulated from the month before, and it was much colder. I will have to run the numbers later, but I am expecting that the insulation will have paid for itself in energy savings the first winter!

All done!
All done!

I still have a bit of insulation to add. I came up a bit short, so I must have blow in insulation a bit thick in places. I am going to need to get a few more bags and toss the insulation in by hand. I made sure that the space left to do was close to the access hatch so it’s easy for me to do.

Not quite enough insulation - I tried to cover the small area left with some, but it got a bit thin here.
Not quite enough insulation - I tried to cover the small area left with some, but it got a bit thin here.

I also have to go around and insulate the rimjoists and find any drafts and close those up as well.

Bella helping daddy fix stuff

Bella and I where fixing an electric heater tonight.

Disasembled electric space heater
Disasembled electric space heater

It was being tossed, so it was tossed in my direction because I guess I am known for collecting junk. Anyway, I’ve been wanting one for Bella’s room on them really cold nights. We’ve been kicking up the thermostat a degree or two and it seems to do the trick, but she’s about as far away from the furnace as you can get in my house.

She was so good watching me take the heater apart. I would show her a tool and tell her what I was doing. She seemed as genuinely interested as any 7 month old could about anything. I think she just liked all the ‘toys’ and movement. The fan just needed some oil on the fan motor bushings, so it took just a few minutes to fix.

Before putting it back together, I washed the only part big enough that she couldn’t swallow it. She was quite happy with the toy! She determined that the fan failed initially because it wasn’t properly taste-tested. She can now claim to have helped fix the space heater.

Bella doing a little Quality Control. The electric heater will now work correctly as it has been properly taste tested.
Bella doing a little Quality Control. The electric heater will now work correctly as it has been properly taste tested.

Last night the two of us tried to install a remote control for the fan in my room that was given to me. Mamma was cooking, so I was watching Bella. I was a bit concerned with how I had this rigged up for testing, but she was pretty interested in her bottle and the mirror I had set her beside by this point in the testing. She tolerated this pretty well. Never did get the fan control working.

Trying to hook up a remote unit to a ceiling fan.
Trying to hook up a remote unit to a ceiling fan.

Last week, the two of us where working on the treadmill I picked up from freecycle. She wasn’t so much loving this. She only lasted about 20 minutes. To her credit, it was kinda chilly in the basement and her bottle was empty. The speed control was burnt out. $60 for a new controller board. Eventually I will save up to get it. Hopefully the motor isn’t shorted or something to cause the controller board to fail like this. $60 for a treadmill in good shape is a great deal.

Blown transister on bottom left and blown capaciter in center 1/3rd from left.
Blown transister on bottom left and blown capaciter in center 1/3rd from left.

I am hoping that I can get Bella interested in fixing stuff. The hobby is so important to me, it would be so much fun to have somebody to wrench on the cars with. The boys will think it’s pretty cool too when she gets old enough for that to matter to her.

Tuning the training rate of a NeuralMesh ANN

I’ve been running a NeuralMesh Artificial Neural Network for about a week now. The point of it is for SPAM detection. I am feeding it 7 different  spam scores from the b8 baysian library.

My 7 ‘aspects’ of email spam are:

  • to
  • from
  • full from
  • subject
  • full subject
  • header
  • body

I am using the To field as sometimes spam is sent to funny names, and sometimes real email is sent to a particular person. The From figures on both the email address and name it comes from – luckily some spammers put their product names in the from.  The Full From processes the from field on a whole, this should be good for duplicate messages – when I’ve seen it once before, I have a good idea for what it is. The subject and full subject operate on the same principle, however sometimes a random word is put into a subject, so the full subject isn’t necessary a solid repeat indicator. The header is designed to look for SPF records and IP addresses in addition to including the above bits. The body, well, that just the whole message.

I feed all these different scores into a 7 input, 5 neuron, 1 output ANN to let it sort out what’s meaningful and what’s not. Every time somebody deletes a message as spam or accepts an email as not spam, I use that as a training trigger for both the b8 Bayesian library and the NeuralMesh ANN library. Because my spam to not spam ration is so out of whack, the training loop doesn’t need many iterations for spam, but needs a lot for not spam.

To address this, I am using an adaptive training iteration system.  The closer to a pure spam score the system has, the fewer training iterations, the further from a pure spam score, the more iterations I do.

$learning_iterations = round((1 – $finalscore[‘AI Score’]) * 100);

This seems to be working well, judging by the table below of the last however many training sessions the ANN has underwent.  The fewer the number of Iterations, the higher the spam score, which the system has seen a lot of. Looking at the Start MSE compared to the end MSE, any training under 4 iterations doesn’t really change anything.  The higher number of iterations events are not spam, so the system is starting with a very high Start MSE, and the extra training iterations is bringing the End MSE right down.

You can see after every non-spam training session, it takes a couple of data sets before it settles back down to it’s usual boring 3s&4s for training. The data corpus is still pretty small, only a few non-spam data points that have been trained into the system so far.

I think I am going to skip all training under 3 iterations, as it seems to not really make any difference. It probably just loads the system up with extra data points, and will eventually slow it down as I accumulate more data. I loose a lot of the little finesse trainings, but I think having fewer meaningful trainings as the corpus drifts will be more resource friendly.

IterationsStart MSEEnd MSEDateExec TimeOff-line
660.9781180000.00000000016/12/2010 9:33:15 am0.01188200n
30.0000870000.00008700016/12/2010 8:45:08 am0.00063600n
30.0001080000.00010800016/12/2010 8:44:43 am0.00059300n
30.0000980000.00009800016/12/2010 8:44:41 am0.00056200n
30.0000830000.00008300016/12/2010 8:44:40 am0.00061200n
30.0001070000.00010700016/12/2010 8:44:39 am0.00062100n
40.0001000000.00010000016/12/2010 8:44:37 am0.00078800n
30.0000780000.00007800016/12/2010 8:44:36 am0.00061500n
30.0000770000.00007700016/12/2010 8:44:35 am0.00069400n
40.0001130000.00011300016/12/2010 8:44:33 am0.00076700n
30.0001140000.00011400016/12/2010 8:44:31 am0.00060800n
30.0000690000.00006800016/12/2010 8:44:30 am0.00061600n
40.0001050000.00010400016/12/2010 8:44:09 am0.00075900n
40.0001010000.00010100016/12/2010 8:44:08 am0.00075800n
40.0001180000.00011800016/12/2010 8:44:06 am0.00075200n
30.0001190000.00011900016/12/2010 8:44:04 am0.00062000n
40.0000660000.00006500016/12/2010 8:44:04 am0.00086800n
30.0001190000.00011900016/12/2010 8:44:01 am0.00061100n
30.0001200000.00012000016/12/2010 8:44:01 am0.00063000n
40.0001230000.00012300016/12/2010 8:43:58 am0.00086400n
40.0001120000.00011200016/12/2010 8:43:30 am0.00080700n
30.0001130000.00011300016/12/2010 8:43:29 am0.00058800n
40.0001140000.00011400016/12/2010 8:43:28 am0.00076300n
30.0001280000.00012800016/12/2010 8:43:27 am0.00060900n
40.0001290000.00012900016/12/2010 8:43:26 am0.00088600n
30.0001050000.00010500016/12/2010 8:43:24 am0.00058300n
40.0001310000.00013000016/12/2010 8:43:23 am0.00076200n
40.0000820000.00008200016/12/2010 8:43:21 am0.00075600n
30.0001340000.00013400016/12/2010 8:43:20 am0.00061800n
40.0001220000.00012200016/12/2010 8:43:04 am0.00075100n
40.0001370000.00013600016/12/2010 8:43:02 am0.00079800n
40.0001390000.00013800016/12/2010 8:43:02 am0.00086800n
40.0001400000.00013900016/12/2010 8:43:00 am0.00073600n
40.0001420000.00014100016/12/2010 8:42:59 am0.00075300n
40.0001280000.00012800016/12/2010 8:42:59 am0.00076200n
40.0000990000.00009800016/12/2010 8:42:57 am0.00075400n
40.0001140000.00011300016/12/2010 8:42:17 am0.00084600n
40.0001470000.00014600016/12/2010 8:39:23 am0.00076900n
40.0001490000.00014800016/12/2010 8:39:20 am0.00079300n
40.0001360000.00013500016/12/2010 8:39:18 am0.00081300n
40.0001380000.00013700016/12/2010 8:39:15 am0.00075000n
40.0001580000.00015700016/12/2010 8:39:13 am0.00079700n
40.0001550000.00015400016/12/2010 8:39:13 am0.00074500n
80.0002910000.00016000016/12/2010 8:39:12 am0.00150400n
190.0006670000.00052400016/12/2010 8:37:45 am0.00386300n
210.0024550000.00077100016/12/2010 8:37:43 am0.00371700n
840.0750700000.00000000016/12/2010 8:35:36 am0.01497400n
50.0001460000.00014400016/12/2010 8:34:20 am0.00097200n
50.0001480000.00014700016/12/2010 8:34:18 am0.00094400n
50.0001510000.00014900016/12/2010 8:34:17 am0.00095600n
50.0001050000.00010400016/12/2010 8:34:14 am0.00096000n
90.0001230000.00012100016/12/2010 8:34:03 am0.00164900n
90.0001260000.00012400016/12/2010 8:30:34 am0.00171600n
60.0001290000.00012700016/12/2010 8:30:13 am0.00114000n
50.0001590000.00015700016/12/2010 8:30:12 am0.00092800n
60.0001690000.00016700016/12/2010 8:30:11 am0.00115800n
60.0001720000.00016900016/12/2010 8:29:32 am0.00231000n
60.0001500000.00014800016/12/2010 8:29:29 am0.00116100n
60.0001920000.00018900016/12/2010 8:29:27 am0.00110500n
60.0001050000.00010400016/12/2010 8:29:17 am0.00114200n
80.0001830000.00017800016/12/2010 8:29:02 am0.00164100n
60.0001950000.00019100016/12/2010 8:28:14 am0.00116900n
50.0002180000.00021500016/12/2010 8:26:38 am0.00098300n
60.0002000000.00019700016/12/2010 8:26:13 am0.00114300n
100.0001670000.00016100016/12/2010 8:25:59 am0.00187000n
60.0002290000.00021600016/12/2010 8:22:49 am0.00110500n
20.0045280000.00452800016/12/2010 8:22:44 am0.00034600n
70.0001760000.00017200016/12/2010 8:22:41 am0.00128300n
80.0002690000.00025900016/12/2010 8:19:45 am0.00146800n
110.0003340000.00031100015/12/2010 5:19:48 pm0.00199000n
100.0003400000.00031900015/12/2010 4:10:16 pm0.00186500n
120.0008690000.00052700015/12/2010 2:47:10 pm0.00221200n
80.0004710000.00044100015/12/2010 2:46:57 pm0.00165100n
120.0005520000.00048800015/12/2010 1:14:04 pm0.00221000n
680.0086670000.00056000015/12/2010 1:12:59 pm0.01240600n
300.9760140000.00000200015/12/2010 1:12:19 pm0.00550100n
40.0000140000.00001400015/12/2010 1:11:02 pm0.00086600n
40.0003710000.00036600015/12/2010 11:13:43 am0.00089800n
40.0000130000.00001300015/12/2010 9:32:22 am0.00078800n
40.0000220000.00002200015/12/2010 9:32:10 am0.00073700n
40.0002470000.00024300015/12/2010 9:32:09 am0.00079800n
30.0000520000.00005200015/12/2010 9:32:08 am0.00057900n
40.0000340000.00003400015/12/2010 9:32:07 am0.00079400n
30.0001540000.00015300015/12/2010 9:32:06 am0.00068800n
40.0000190000.00001900015/12/2010 9:32:04 am0.00081700n
40.0000170000.00001700015/12/2010 9:32:03 am0.00090600n
40.0000140000.00001400015/12/2010 9:32:01 am0.00079600n
40.0000450000.00004500015/12/2010 9:31:59 am0.00079000n
40.0000320000.00003200015/12/2010 9:31:58 am0.00074200n
40.0000190000.00001900015/12/2010 9:31:57 am0.00076000n
40.0002620000.00025800015/12/2010 9:31:33 am0.00082000n
40.0000940000.00009300015/12/2010 9:30:55 am0.00073200n
30.0000510000.00005100015/12/2010 9:30:54 am0.00061300n
40.0000140000.00001400015/12/2010 9:30:53 am0.00078600n
40.0000140000.00001400015/12/2010 9:30:51 am0.00077100n
40.0002820000.00027800015/12/2010 9:30:49 am0.00081600n
40.0000140000.00001400015/12/2010 9:30:48 am0.00074700n
40.0000140000.00001400015/12/2010 9:30:47 am0.00074000n
40.0000140000.00001400015/12/2010 9:30:46 am0.00080700n
40.0000200000.00002000015/12/2010 9:30:45 am0.00084900n

I am probably going about doing this all wrong, but so far it is seaming to work. Let me know if there is a better way.

Artificial Neural Networks in PHP using NeuralMesh with Bayesian Inferencing using b8

I’ve been interested in Fuzzy Logic and Artificial Neural Networks (ANN) for some time now. I guess it’s the whole binary doesn’t describe our world full of shades of gray very well thing that caught my attention.

I’ve owned a book on the topic for a while now. C++ Neural Networks and Fuzzy Logic by Valluru B. Rao Well, it’s been a real long time as the examples in the book are provided on a floppy disk.

I’ve not had much chance to apply the concepts for a few reasons. Most websites don’t need AI capabilities. There haven’t any ‘easy’ PHP libraries to use. I briefly looked into porting the C++ code provided in the book over, but I am too lazy and my math skills aren’t good enough to make sure I did it right. Also, I couldn’t figure out how to take the ‘stuff’ I dealt with every day and boil it down to numerical values that an ANN can process.

I recently stumbled across a new 100% PHP based Neural Network library called NeuralMesh. http://neuralmesh.com/ http://sourceforge.net/projects/neuralmesh/

My ‘Ah-ha’ moment was when I realized that I could use the Bayesian Statistical Inferencing stuff I have been using for SPAM filtering email as the numerical input into a Neural Net.

Now that I have a tool that can take a block of text and turn it into a number, I can play with Neural nets. After all, I’ve been creating a ‘SPAM score’ for email messages for about a year now, so I got the numericalization of text down pretty well.

I have been having trouble with my spam filter lately. I’ve relied on it too much. I’ve categorized over 13,000 email messages. When a spammer changes tactics and tries something new, it can take dozens of marking the new spam messages spam before the Bayesian library can overcome all the inertia and poisoning before it’s recognized as spam.

I need to change tactics. The Bayesian library I am using, b8, is optimized for very short messages, like blog post comment spam, and not potentially large email messages. My new tact will be to feed b8 what it’s designed for, smaller chunks of text. I will give it the email subject line both in pieces and in whole. The whole subject scoring will allow it to quickly categorize messages it has seen before – fast squashing of duplicate messages. The subject pieces scoring will allow it to quickly pick up on the old ‘random bit of text’ in an otherwise rehashed subject trick. A few bits and pieces of the email header like the IP address, user agent, spam-cop score, etc. will all be picked out and fed into b8 for individual scoring as well.

This is where NeuralMesh comes in. I can feed all these different spam scores into the Neural Network as separate inputs and let the ANN figure out what scores are meaningful and what combinations are not. There is always a human that decides if any given email is truly a good email or if it is spam, so I have a built in feedback loop. This should allow the ANN to quickly learn all it needs to know about spam.

Unfortunately, I can’t just jump right in and expect good results. The first thing I have to do is create the multiple scores and let the Bayesian Inferencing get ‘learned up’ on what’s spam and what’s not. I will continue to rely on the current, overworked, filter while the new set is watching over it’s shoulder and learning.

If I don’t do this, I will be feeding the ANN un-educated guesses by the new Naive Bayesian filters which is basically gibberish the first few ANN learning sessions. At my email usage rate, after a week or two b8 will have things pretty much figured out and I can start training the ANN with meaningful numbers.

One of the things I will be doing with the ANN is adaptive training. Each time a human makes or confirms a spam/not spam decision, I feed that back into the ANN for further training.  When you present training data to a ANN, you tell it to train using X number of iterations until it has figured things out. Being that my Spam to not spam ration is so out of proportion,  it is very heavily biased towards spam.  So, the less spamy a message is, the more iterations I allow the spam filter to train on. Conversely, the more spammy it is, the fewer training iterations a message gets – to the point that if it’s 99.5% or more  likely to be spam, I don’t train the ANN at all. It’s just telling it what it already knows.

Now, this process doesn’t apply to just spam. By training b8 different categories for a given text string, it can do a lot of different categorizations for me.

In practice, it appears to be working as I have planned. Time will tell if this is a good approach or not.