Tuning the training rate of a NeuralMesh ANN

I’ve been running a NeuralMesh Artificial Neural Network for about a week now. The point of it is for SPAM detection. I am feeding it 7 different  spam scores from the b8 baysian library.

My 7 ‘aspects’ of email spam are:

  • to
  • from
  • full from
  • subject
  • full subject
  • header
  • body

I am using the To field as sometimes spam is sent to funny names, and sometimes real email is sent to a particular person. The From figures on both the email address and name it comes from – luckily some spammers put their product names in the from.  The Full From processes the from field on a whole, this should be good for duplicate messages – when I’ve seen it once before, I have a good idea for what it is. The subject and full subject operate on the same principle, however sometimes a random word is put into a subject, so the full subject isn’t necessary a solid repeat indicator. The header is designed to look for SPF records and IP addresses in addition to including the above bits. The body, well, that just the whole message.

I feed all these different scores into a 7 input, 5 neuron, 1 output ANN to let it sort out what’s meaningful and what’s not. Every time somebody deletes a message as spam or accepts an email as not spam, I use that as a training trigger for both the b8 Bayesian library and the NeuralMesh ANN library. Because my spam to not spam ration is so out of whack, the training loop doesn’t need many iterations for spam, but needs a lot for not spam.

To address this, I am using an adaptive training iteration system.  The closer to a pure spam score the system has, the fewer training iterations, the further from a pure spam score, the more iterations I do.

$learning_iterations = round((1 – $finalscore[‘AI Score’]) * 100);

This seems to be working well, judging by the table below of the last however many training sessions the ANN has underwent.  The fewer the number of Iterations, the higher the spam score, which the system has seen a lot of. Looking at the Start MSE compared to the end MSE, any training under 4 iterations doesn’t really change anything.  The higher number of iterations events are not spam, so the system is starting with a very high Start MSE, and the extra training iterations is bringing the End MSE right down.

You can see after every non-spam training session, it takes a couple of data sets before it settles back down to it’s usual boring 3s&4s for training. The data corpus is still pretty small, only a few non-spam data points that have been trained into the system so far.

I think I am going to skip all training under 3 iterations, as it seems to not really make any difference. It probably just loads the system up with extra data points, and will eventually slow it down as I accumulate more data. I loose a lot of the little finesse trainings, but I think having fewer meaningful trainings as the corpus drifts will be more resource friendly.

IterationsStart MSEEnd MSEDateExec TimeOff-line
660.9781180000.00000000016/12/2010 9:33:15 am0.01188200n
30.0000870000.00008700016/12/2010 8:45:08 am0.00063600n
30.0001080000.00010800016/12/2010 8:44:43 am0.00059300n
30.0000980000.00009800016/12/2010 8:44:41 am0.00056200n
30.0000830000.00008300016/12/2010 8:44:40 am0.00061200n
30.0001070000.00010700016/12/2010 8:44:39 am0.00062100n
40.0001000000.00010000016/12/2010 8:44:37 am0.00078800n
30.0000780000.00007800016/12/2010 8:44:36 am0.00061500n
30.0000770000.00007700016/12/2010 8:44:35 am0.00069400n
40.0001130000.00011300016/12/2010 8:44:33 am0.00076700n
30.0001140000.00011400016/12/2010 8:44:31 am0.00060800n
30.0000690000.00006800016/12/2010 8:44:30 am0.00061600n
40.0001050000.00010400016/12/2010 8:44:09 am0.00075900n
40.0001010000.00010100016/12/2010 8:44:08 am0.00075800n
40.0001180000.00011800016/12/2010 8:44:06 am0.00075200n
30.0001190000.00011900016/12/2010 8:44:04 am0.00062000n
40.0000660000.00006500016/12/2010 8:44:04 am0.00086800n
30.0001190000.00011900016/12/2010 8:44:01 am0.00061100n
30.0001200000.00012000016/12/2010 8:44:01 am0.00063000n
40.0001230000.00012300016/12/2010 8:43:58 am0.00086400n
40.0001120000.00011200016/12/2010 8:43:30 am0.00080700n
30.0001130000.00011300016/12/2010 8:43:29 am0.00058800n
40.0001140000.00011400016/12/2010 8:43:28 am0.00076300n
30.0001280000.00012800016/12/2010 8:43:27 am0.00060900n
40.0001290000.00012900016/12/2010 8:43:26 am0.00088600n
30.0001050000.00010500016/12/2010 8:43:24 am0.00058300n
40.0001310000.00013000016/12/2010 8:43:23 am0.00076200n
40.0000820000.00008200016/12/2010 8:43:21 am0.00075600n
30.0001340000.00013400016/12/2010 8:43:20 am0.00061800n
40.0001220000.00012200016/12/2010 8:43:04 am0.00075100n
40.0001370000.00013600016/12/2010 8:43:02 am0.00079800n
40.0001390000.00013800016/12/2010 8:43:02 am0.00086800n
40.0001400000.00013900016/12/2010 8:43:00 am0.00073600n
40.0001420000.00014100016/12/2010 8:42:59 am0.00075300n
40.0001280000.00012800016/12/2010 8:42:59 am0.00076200n
40.0000990000.00009800016/12/2010 8:42:57 am0.00075400n
40.0001140000.00011300016/12/2010 8:42:17 am0.00084600n
40.0001470000.00014600016/12/2010 8:39:23 am0.00076900n
40.0001490000.00014800016/12/2010 8:39:20 am0.00079300n
40.0001360000.00013500016/12/2010 8:39:18 am0.00081300n
40.0001380000.00013700016/12/2010 8:39:15 am0.00075000n
40.0001580000.00015700016/12/2010 8:39:13 am0.00079700n
40.0001550000.00015400016/12/2010 8:39:13 am0.00074500n
80.0002910000.00016000016/12/2010 8:39:12 am0.00150400n
190.0006670000.00052400016/12/2010 8:37:45 am0.00386300n
210.0024550000.00077100016/12/2010 8:37:43 am0.00371700n
840.0750700000.00000000016/12/2010 8:35:36 am0.01497400n
50.0001460000.00014400016/12/2010 8:34:20 am0.00097200n
50.0001480000.00014700016/12/2010 8:34:18 am0.00094400n
50.0001510000.00014900016/12/2010 8:34:17 am0.00095600n
50.0001050000.00010400016/12/2010 8:34:14 am0.00096000n
90.0001230000.00012100016/12/2010 8:34:03 am0.00164900n
90.0001260000.00012400016/12/2010 8:30:34 am0.00171600n
60.0001290000.00012700016/12/2010 8:30:13 am0.00114000n
50.0001590000.00015700016/12/2010 8:30:12 am0.00092800n
60.0001690000.00016700016/12/2010 8:30:11 am0.00115800n
60.0001720000.00016900016/12/2010 8:29:32 am0.00231000n
60.0001500000.00014800016/12/2010 8:29:29 am0.00116100n
60.0001920000.00018900016/12/2010 8:29:27 am0.00110500n
60.0001050000.00010400016/12/2010 8:29:17 am0.00114200n
80.0001830000.00017800016/12/2010 8:29:02 am0.00164100n
60.0001950000.00019100016/12/2010 8:28:14 am0.00116900n
50.0002180000.00021500016/12/2010 8:26:38 am0.00098300n
60.0002000000.00019700016/12/2010 8:26:13 am0.00114300n
100.0001670000.00016100016/12/2010 8:25:59 am0.00187000n
60.0002290000.00021600016/12/2010 8:22:49 am0.00110500n
20.0045280000.00452800016/12/2010 8:22:44 am0.00034600n
70.0001760000.00017200016/12/2010 8:22:41 am0.00128300n
80.0002690000.00025900016/12/2010 8:19:45 am0.00146800n
110.0003340000.00031100015/12/2010 5:19:48 pm0.00199000n
100.0003400000.00031900015/12/2010 4:10:16 pm0.00186500n
120.0008690000.00052700015/12/2010 2:47:10 pm0.00221200n
80.0004710000.00044100015/12/2010 2:46:57 pm0.00165100n
120.0005520000.00048800015/12/2010 1:14:04 pm0.00221000n
680.0086670000.00056000015/12/2010 1:12:59 pm0.01240600n
300.9760140000.00000200015/12/2010 1:12:19 pm0.00550100n
40.0000140000.00001400015/12/2010 1:11:02 pm0.00086600n
40.0003710000.00036600015/12/2010 11:13:43 am0.00089800n
40.0000130000.00001300015/12/2010 9:32:22 am0.00078800n
40.0000220000.00002200015/12/2010 9:32:10 am0.00073700n
40.0002470000.00024300015/12/2010 9:32:09 am0.00079800n
30.0000520000.00005200015/12/2010 9:32:08 am0.00057900n
40.0000340000.00003400015/12/2010 9:32:07 am0.00079400n
30.0001540000.00015300015/12/2010 9:32:06 am0.00068800n
40.0000190000.00001900015/12/2010 9:32:04 am0.00081700n
40.0000170000.00001700015/12/2010 9:32:03 am0.00090600n
40.0000140000.00001400015/12/2010 9:32:01 am0.00079600n
40.0000450000.00004500015/12/2010 9:31:59 am0.00079000n
40.0000320000.00003200015/12/2010 9:31:58 am0.00074200n
40.0000190000.00001900015/12/2010 9:31:57 am0.00076000n
40.0002620000.00025800015/12/2010 9:31:33 am0.00082000n
40.0000940000.00009300015/12/2010 9:30:55 am0.00073200n
30.0000510000.00005100015/12/2010 9:30:54 am0.00061300n
40.0000140000.00001400015/12/2010 9:30:53 am0.00078600n
40.0000140000.00001400015/12/2010 9:30:51 am0.00077100n
40.0002820000.00027800015/12/2010 9:30:49 am0.00081600n
40.0000140000.00001400015/12/2010 9:30:48 am0.00074700n
40.0000140000.00001400015/12/2010 9:30:47 am0.00074000n
40.0000140000.00001400015/12/2010 9:30:46 am0.00080700n
40.0000200000.00002000015/12/2010 9:30:45 am0.00084900n

I am probably going about doing this all wrong, but so far it is seaming to work. Let me know if there is a better way.

Leave a comment

Your email address will not be published. Required fields are marked *

WordPress Appliance - Powered by TurnKey Linux