First thing, let's create a table and populate it with some basic data to simulate the NAND function. In the write-ups the data is invariably identified as binary (0,1 or -1,1); I will make use of SQL and use data labels (true, false) which are easier to interpret and permit more input attribute values. Note however, that converting the class value to (0,1) will simplify the program.
create table #rawdata (id int identity(1,1), x1 varchar(5), x2 varchar(5), z varchar(5))
insert into #rawdata select 'false', 'false', 'true'
insert into #rawdata select 'false', 'true', 'true'
insert into #rawdata select 'true', 'false', 'true'
insert into #rawdata select 'true', 'true', 'false'
To leverage SQL functionality convert the data table to a data vector adding a bias field which works like a constant in regression.
select id, 'bias' attribute,'bias' value into #data from #rawdata
union all
select id , 'x1' , x1 from #rawdata
union all
select id , 'x2' , x2 from #rawdata
Similarly create a class data vector; as noted earlier, convert the value of the class attribute value to (0,1).
select id, 'z' attribute, case when z='true' then 1.0 else 0 end value into #class from #rawdata
Recall the algorithm compares the sum of the weights for each observation to a threshold value to determine membership. The next step is to initiate the table of weights which can be set at 0.
select attribute, value,cast(0 as float) wght into #weight from #data group by attribute, value
One of the reasons I like the Naive Bayes algorithm is it doesn't require iteration to get to the weights; unfortunately the Perceptron (and most data mining algorithms) learns through iteration. The processes: pick an observation, calculate membership and if there is a miss-classification (error), adjust the weights in the direction of correct classification. Repeat until convergence (or patience is exhausted). If the class has a perfectly separable solution, it will converge and the weights will stop changing. The real world is seldom so co-operative (this example is; the Titanic data is not) and the results will vacillate. A solution to this problem is the pocket algorithm: calculate the overall fit after each iteration and if the fit is the best, put the weights in your pocket. When done, use these weights which are in the #bestweight table:
select * into #bestweight from #weight
Begin the learning program by declaring the necessary variables:
- threshold is the level necessary to classify, generally 0.5.
- alpha is the amount the weights will be adjusted should there be an error and experience shows 0.1 is a good value.
- count keeps track of how many times though the whole data set (not many are required) and id is the individual record being evaluated
- fit is the sum of the weights and error is difference versus the actual
- best and newbest track the overall fit to determine if the new weights should replace those in your pocket
declare @threshold float, @alpha float, @count int, @id int, @fit float, @err float ,@best float,@newbest float
select @threshold = .5, @alpha = .1, @count = 5, @best = 0
while @count>=0
begin
set @id = 1
while @id<= (select max(id) from #class)
begin
select @fit = sum(wght) from #data d join #weight w on d.attribute = w.attribute and d.value = w.value where d.id = @id
select @err = c.value - case when @fit>.5 then 1.0 else 0 end from #class c where c.id = @id
update w set w.wght = w.wght+@err*@alpha from #weight w join #data d on w.attribute = d.attribute and w.value = d.value and d.id = @id
select @newbest = sum(case when fit = value then 1.0 else 0 end)/count(1) from
(select c.id,case when sumw>.5 then 1 else 0 end fit, c.value from #class c join
(select d.id, sum(wght) sumw
from #data d
join #weight w on d.attribute = w.attribute and d.value = w.value
group by d.id) n
on c.id = n.id) x
if @newbest>@best
begin
set @best = @newbest
drop table #bestweight
select * into #bestweight from #weight
end
set @id = @id+1
end
set @count=@count-1
end
Finally, evaluate the results with the following query:
select * from (select c.id,case when sumw>@threshold then 1 else 0 end fit, c.value from #class c join (select d.id, sum(wght) sumw from #data d join #weight w on d.attribute = w.attribute and d.value = w.value group by d.id) n on c.id = n.id) x
The fit with the initial (zero value) weights:
id | fit | value |
1 | 0 | 1 |
2 | 0 | 1 |
3 | 0 | 1 |
4 | 0 | 0 |
After running the program it is 100%:
id | fit | value |
1 | 1 | 1 |
2 | 1 | 1 |
3 | 1 | 1 |
4 | 0 | 0 |
With the following weights:
select * from #weight
attribute | value | weight |
bias | bias | 0.3 |
x1 | false | 0.2 |
x1 | true | 0.1 |
x2 | false | 0.2 |
x2 | true | 0.1 |
This last step does not use the #bestweights, but per the logic of the program they would be the same.
How does the Perceptron fare on the Titanic data? First, bring in the Titanic train data into the data vector and then the survival field into the class vector:
select passengerid id, 'bias' attribute,'bias' value into #data from train
union all
select passengerid , 'sex' , sex from train
union all
select passengerid, 'pclass' , cast(pclass as varchar) from train
union all
select passengerid, 'embarked' , cast(embarked as varchar) from train
union all
select passengerid, 'cabin' , isnull(left(cabin,1),'na') from train
select passengerid id, 'survived' attribute, cast(survived as float) value into #class from train
Initiate the #weights and #bestweights tables and run the learning program. No further modifications to the program should be required.
The basic fit can be evaluated with the following which shows 71.7171% correct classification; using the bestweight table improves the correct classification rate to 81.8181%.
select sum(case when fit = value then 1.0 else 0 end)/count(1) from
(select c.id,case when sumw>.5 then 1 else 0 end fit, c.value from #class c join
(select d.id, sum(wght) sumw
from #data d
join #weight w on d.attribute = w.attribute and d.value = w.value
group by d.id) n
on c.id = n.id) x
The weights in the #bestweight table:
attribute | value | weight |
bias | bias | 0.2 |
cabin | A | 0 |
cabin | B | 0.2 |
cabin | C | 0 |
cabin | D | 0.2 |
cabin | E | 0.3 |
cabin | F | 0.1 |
cabin | G | -0.4 |
cabin | n | 0 |
cabin | T | -0.2 |
embarked | NULL | 0 |
embarked | C | 0.1 |
embarked | Q | 0.1 |
embarked | S | 0 |
pclass | 1 | 0.1 |
pclass | 2 | 0.2 |
pclass | 3 | -0.1 |
sex | female | 0.3 |
sex | male | -0.1 |
I submitted the test data classification using the #bestweight table to Kaggle and improved my score to 76.077% and ranking by nearly 300 places to about 4800th. Not where I'd like to be, but not bad for a half century old, simple algorithm.
No comments:
Post a Comment