## An attempt to reconstruct my old genetic algorithm market forecaster Predicto.

Maybe easiest to read if you start from the bottom. Screen output in blue, program code in red. Comments can be directed to bobmoore "@" pobox.com..

11/14/2014
Progress slowed down because of difficulty porting the stats module I use in the Python 2 version to Python 3. Very frustrating. One of the measures I use to evaluate the accuracy of algorithms is the Pearson Correlation. In the 2.7 version, I used the code from the stats module, but it turned out that didn't work in the Python 3 version. Instead, I had to use a bit of code I found on the Internet for the Pearson coefficient:

It was quite a lot of work to get this integrated into the Python 3 version -- plus, my flounering about broke the link to the stats module in the Python 2 version. I have to run some tests to see whether this function works as expected. I compare it with the PEARSON function in Excel.

```
#
import math

list1=[1, 2, 3, 4, 5]
list2=[5, 4, 3, 2, 1]

def average(x):
assert len(x) > 0
return float(sum(x)) / len(x)

def pearsonr2(x, y):
assert len(x) == len(y)
n = len(x)
assert n > 0
avg_x = average(x)
avg_y = average(y)
diffprod = 0
xdiff2 = 0
ydiff2 = 0
for idx in range(n):
xdiff = x[idx] - avg_x
ydiff = y[idx] - avg_y
diffprod += xdiff * ydiff
xdiff2 += xdiff * xdiff
ydiff2 += ydiff * ydiff
return diffprod / math.sqrt(xdiff2 * ydiff2)

out=pearsonr2(list1, list2)
print("Correlation of {0} with {1} is {2}".format(list1, list2, out))
```

Here are some tests:

```
Correlation of [1, 2, 3, 4, 5] with [5, 4, 3, 2, 1] is -1.0
Correlation of [1, 2, 3, 4, 5] with [1, 2, 3, 4, 5] is 1.0
Correlation of [1, 2, 3, 4, 5] with [5, 4, 5, 2, 1] is -0.8703882797784892
Correlation of [1, 2, 3, 4, 5] with [5, 4, 5, 2, 3] is -0.7276068751089989
Correlation of [1, 2, 3, 4, 5] with [5000, 4, 5, 2, 3] is -0.7073189732785681
```

Happily, the test results agree with the Excel PEARSON function.

Another thing I did was to make an HTML readout of Predicto forecasts on my www.philly-bob.net Free-for-All web page. What I did was insert a tiny bit of code into each IMG link: USEMAP="test">. The link to USEMAP connects to a user-defined mapping onto the imges defined here:

```
<MAP NAME="test">
<area shape="circle" coords="0,0,20" href="pythonfiles/RR7X.HTM">
</MAP>
```

This creates a 20-pixel round hotspot at the top left corner of the image, which connects to the daily forecast file RR7.HTM. Red guesses indicate downward market, green guesses indicate upward market.

Predicto Forecasts 2014-11-13
Overall Weighted Guess= 0.637
Overall Average Guess= 0.508
+hicor= 0.502 hicorn= 45 Ges= 0.478
+hiacc= 62.26 pct hiaccn= 29 hiaccn Ges= 0.521
+hiallcor= 0.346 hiallcorn= 139 Ges= 0.421
-locor= -0.491 locorn= 113 Ges= 0.460
-loacc= 28.30 pct loaccn= 79 loaccn Ges= 0.485
-loallcor= -0.370 loallcorn= 33 Ges= 0.460
Short Termers
+hishcor= 0.518 hishcorn= 92 Ges= 0.458
+hishacc= 75.00 pct hishaccn= 70 hishaccn Ges= 0.599
+hiallshcor= 0.485 hiallshcorn= 92 Ges= 0.458
-loshcor= -0.497 loshcorn= 211 Ges= 0.531
-loshacc= 25.00 pct loshaccn= 28 loshaccn Ges= 0.447
-loallshcor= -0.519 loallshcorn= 83 Ges= 0.550

This is the document I check out each day to decide whether to make minor adjustments in protecting my tiny nest egg, which is mainly invested in various stocks.

If I get curious about one particular algorithm I can use a utility program called onealgrec3 to get details. For instance, here I look up details of Algorithm #29, which just moved into first place on the "HiAcc" measure:

```
Which alg?29
Alg 29 ['mydiv', 'F', 'High', 'mytimes', 'AAPL', 'Open', 'IBM', 'High']
('Next day ges=', '0.521')
AvgCor= 0.19817
gesS= ['0.468', '0.513', '0.538', '0.559', '0.471', '0.456', '0.503', '0.496', '0.457', '0.498', '0.468', '0.527', '0.492', '0.478', '0.498', '0.499', '0.505', '0.484', '0.541', '0.575', '0.514', '0.382', '0.484', '0.514', '0.492', '0.490', '0.505', '0.483', '0.477', '0.494', '0.494', '0.543', '0.525', '0.568', '0.484', '0.553', '0.483', '0.512', '0.513', '0.489', '0.469', '0.510', '0.484', '0.478', '0.496', '0.509', '0.470', '0.515', '0.546', '0.496', '0.465', '0.547', '0.519'] len= 53
futs= ['-1.56', '-3.07', '10.06', '-3.07', '-6.17', '-13.10', '7.25', '1.76', '-11.91', '-1.41', '14.85', '2.59', '9.79', '-0.96', '-16.11', '-11.52', '15.53', '-32.31', '16.86', '-5.05', '-5.51', '-26.13', '0.01', '21.73', '-3.08', '-29.72', '33.79', '-40.68', '-22.08', '-31.39', '2.96', '-15.21', '0.27', '24.00', '17.25', '37.27', '-14.17', '23.71', '13.76', '-2.95', '23.42', '-2.75', '12.35', '23.40', '-0.24', '-5.71', '11.47', '7.64', '0.71', '6.34', '1.42', '-1.43', '1.08'] len= 53
AllCor= 0.30942
- - C - C C C - C - X - X - C C C C C X X C - C - C C C C C - X - C X C C C C - X - X X - X X C - X - - -
('Signifigant Accuracy=', 23, '/', 35) SigAcc= 65.71
C X C X C C C X C C X C X C C C C C C X X C X C C C C C C C X X C C X C C C C C X X X X C X X C C X X X C
('All Accuracy=', 33, '/', 53) AllAcc= 62.26
----
('Accuracy over last', 20, 'days')
AvgCor= 0.13086
AllCor= 0.17722
C X C C C C - X - X X - X X C - X - - -
('Signifigant Accuracy=', 6, '/', 13) SigAcc= 46.15
C X C C C C C X X X X C X X C C X X X C
('All Accuracy=', 10, '/', 20) AllAcc= 50.00
```

This shows that Algorithm #29 has been right on 23 of the last 35 significant market days, and wrong on 12, for an accuracy of 65.7%.

11/10/2014
It's time to show the Python3 code that evaluates an algorithm. Here 'tis, a program called testeval.py:

```
#
import random

datasuf=".csv"
rawvalues=['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
values=rawvalues[1:5]
lenvalues=len(values)-1
tickers=["SP500","AAPL","CVX","F","GE","IBM","T","WMT","XOM"]
lentickers=len(tickers)-1
ops=["myplus","myminus","mydiv","mytimes"]
lenops=len(ops)-1

testalg=['mytimes', 'SP500', 'Close', 'mydiv', 'GE', 'Close', 'GE', 'Close']

def stackpush(item):
mystack.append(item)
return
def stackpop():
get=mystack.pop()
return get

def myplus():
a=float(stackpop())
b=float(stackpop())
return a+b
def myminus():
a=float(stackpop())
b=float(stackpop())
return a-b
def mydiv():
a=float(stackpop())
b=float(stackpop())
if b==0.0:
return 0.0
else:
return a/b
def mytimes():
a=float(stackpop())
b=float(stackpop())
return a*b

def valtickday(value,ticker,day):
global algdate
numval=rawvalues.index(value)
f=open(fn,'r')
f.close()
l=daylines[day].split(',')
ans=l[numval]
stackpush(ans)
algdate=l[0]
return ans

def evalalgdays(startday, stopday):
evals=[]
global mystack
global algdays
global algdates
algdays=[]
algdates=[]
for thisday in range(startday, stopday):
algdays.append(thisday)
mystack=[]
valtickday(alg[-1], alg[-2], thisday)
valtickday(alg[-3], alg[-4], thisday)
z=eval(alg[-5]+'()')
stackpush(z)
valtickday(alg[-6], alg[-7], thisday)
final=eval(alg[-8]+'()')
evals.append(final)
algdates.append(algdate)
return evals

def randop():
return(ops[random.randint(0,lenops)])
def randticker():
return(tickers[random.randint(0,lentickers)])
def randval():
return(values[random.randint(0,lenvalues)])

mystack=[]
startday=1
stopday=15

alg=testalg
evals=evalalgdays(startday, stopday)
print("Alg= {0}".format(alg))
print("eval= {0}".format(evals))
print("algdays= {0}".format(algdays))
print("algdates= {0}".format(algdates))
print()
alg=[randop(), randticker(), randval(), \
randop(), randticker(), randval(), \
randticker(), randval()]
evals=evalalgdays(startday, stopday)
print("Alg= {0}".format(alg))
print("eval= {0}".format(evals))
print("algdays= {0}".format(algdays))
print("algdates= {0}".format(algdates))
```

Here's the outcome of running this program, first on my standard testalg, which always evaluates to the S&P500 Close, and next on a random alg.

```
Alg= ['mytimes', 'SP500', 'Close', 'mydiv', 'GE', 'Close', 'GE', 'Close']
eval= [2017.81, 2018.05, 1994.65, 1982.3, 1985.05, 1961.63, 1964.58, 1950.82, 1927.11, 1941.28, 1904.01, 1886.76, 1862.76, 1862.49]
algdays= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
algdates= ['2014-11-03', '2014-10-31', '2014-10-30', '2014-10-29', '2014-10-28', '2014-10-27', '2014-10-24', '2014-10-23', '2014-10-22', '2014-10-21', '2014-10-20', '2014-10-17', '2014-10-16', '2014-10-15']

Alg= ['mydiv', 'AAPL', 'Close', 'mytimes', 'CVX', 'Close', 'IBM', 'Low']
eval= [0.005733897951417679, 0.005502842906216424, 0.005599304593376804, 0.005630003827185758, 0.00563223202811958, 0.005647618015685473, 0.005622627887064053, 0.005585174616171489, 0.0056078277630762195, 0.005506844910953022, 0.005367981104405183, 0.004847484162171896, 0.004850515976282439, 0.0049938530399773196]
algdays= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
algdates= ['2014-11-03', '2014-10-31', '2014-10-30', '2014-10-29', '2014-10-28', '2014-10-27', '2014-10-24', '2014-10-23', '2014-10-22', '2014-10-21', '2014-10-20', '2014-10-17', '2014-10-16', '2014-10-15']
```

Two things to note about this code. First, it assumes algs are eight items long. This could be extended; easy as pie to make an algorithm of length eleven items. Second, it only allows a subset of values, ignoring "Volume" and "Adjusted Close." These can be changed.

Next, we'll examine how to evaluate the accuracy of a Predicto algorithm at forecasting the stock market. What this means is making a list of evaluations of the algorithm such as eval= [0.005733897951417679, 0.005502842906216424, 0.005599304593376804, 0.005630003827185758, 0.00563223202811958, 0.005647618015685473, 0.005622627887064053, 0.005585174616171489, 0.0056078277630762195, 0.005506844910953022, 0.005367981104405183, 0.004847484162171896, 0.004850515976282439, 0.0049938530399773196] and then comparing that with a list of next-day changes in the S&P500.

11/08/2014
With a free weekend, set out to do the hard work of explaining how Predicto algorithms are solved. But first, I face a problem I faced when I did the program in Python 2:

```
#
import msvcrt

while not msvcrt.kbhit():
print('.', end='')
print("kbhit!")
```

This program run in some Python setups but not in others. Here's a question I asked on the genius site stackoverflow.com:

```
This simple program works when entered from a command line, but not in IDLE.

#
import msvcrt

while not msvcrt.kbhit():
print('.', end='')
print("kbhit!")
I had hoped this problem would disappear in Python 3.

Some questions:

Is there a way to get around this in IDLE?
Is there some other Python code editor that does not have this limitation?
I am writing a blog on my transition to Python3 and I would like to explain WHY I have to transfer my work from IDLE to a command line. What is the simple, boxtop explanation of this apparent inconsistency, suitable for passing on to readers who don't know Python?
```

Generally, the response was that there was no way around the mismatch between IDLE and Python: "You may want to know, that IDLE is a layered application, that uses Tcl/Tk based Tkinter Controller part, that also scans and evaluates keyboard-related events in the .mainloop() and that simply gets into conflict with your intention to detect .kbhit()." Tell your readers that it is due to an "un-avoidable collision of two concurrent Controllers ( a .mainloop() + .kbdhit() ) [and]DLE's .mainloop() won in your demonstrated example :o)" Will look into an alternative to IDLE, called

11/06/2014
Made some improvements in utility program ONEALGREC1.PY, which shows some details of individual algorithms. Here's run on three high and three low algorithms.

```
>>> ================================ RESTART ================================
>>>
Which alg?179
Alg 179 ['myplus', 'F', 'Low', 'myminus', 'CVX', 'High', 'SP500', 'Close']
('Next day ges=', '0.485')
AvgCor= 0.51204
AllCor= 0.14882
- - C - C X C - C - C - X - X X C C C C X X - X - X C C X X - C - C X X C C X - C - C X - X C X
('Signifigant Accuracy=', 18, '/', 34) SigAcc= 52.94
C X C X C X C X C X C X X C X X C C C C X X C X C X C C X X C C C C X X C C X C C C C X C X C X
('All Accuracy=', 27, '/', 48) AllAcc= 56.25
>>> ================================ RESTART ================================
>>>
Which alg?59
Alg 59 ['myminus', 'XOM', 'Close', 'mytimes', 'SP500', 'Close', 'GE', 'High']
('Next day ges=', '0.448')
AvgCor= 0.37327
AllCor= 0.19036
- - C - C C C - C - C - X - C X C C C C X C - C - C C C X X - C - X X X C C X - X - C C - X C X
('Signifigant Accuracy=', 22, '/', 34) SigAcc= 64.71
C X C C C C C X C X C X X C C X C C C C X C C C C C C C X X C C C X X X C C X C X C C C C X C X
('All Accuracy=', 32, '/', 48) AllAcc= 66.67
>>> ================================ RESTART ================================
>>>
Which alg?51
Alg 51 ['mydiv', 'CVX', 'Low', 'myminus', 'IBM', 'Low', 'T', 'Low']
('Next day ges=', '0.562')
AvgCor= 0.00827
AllCor= 0.41746
- - X - C X X - C - X - X - X C C C C C C C - X - X C C C C - C - C C C X C X - X - C X - X X C
('Signifigant Accuracy=', 20, '/', 34) SigAcc= 58.82
C C X X C X X X C C X C X C X C C C C C C C C X C X C C C C X C C C C C X C X C X X C X X X X C
('All Accuracy=', 29, '/', 48) AllAcc= 60.42
>>> ================================ RESTART ================================
>>>
Which alg?113
Alg 113 ['myplus', 'F', 'Close', 'myplus', 'IBM', 'Close', 'SP500', 'Close']
('Next day ges=', '0.515')
AvgCor= -0.49108
AllCor= -0.16979
- - X - X C X - X - X - C - C C X X X X C C - X - C X X C C - X - X C C X X C - X - X C - C X C
('Signifigant Accuracy=', 15, '/', 34) SigAcc= 44.12
X C X C X C X C X C X C C X C C X X X X C C X X X C X X C C X X X X C C X X C X X X X C X C X C
('All Accuracy=', 20, '/', 48) AllAcc= 41.67
>>> ================================ RESTART ================================
>>>
Which alg?79
Alg 79 ['mydiv', 'AAPL', 'Open', 'myminus', 'SP500', 'Close', 'SP500', 'Low']
('Next day ges=', '0.484')
AvgCor= -0.06165
AllCor= -0.01654
- - C - C X C - X - X - C - X X X X X X X X - X - X C C X X - C - X X X X C X - C - C X - X X C
('Signifigant Accuracy=', 11, '/', 34) SigAcc= 32.35
C X C C C X C X X X X X C X X X X X X X X X X X X X C C X X X C X X X X X C X X C C C X X X X C
('All Accuracy=', 14, '/', 48) AllAcc= 29.17
>>> ================================ RESTART ================================
>>>
Which alg?111
Alg 111 ['mytimes', 'IBM', 'High', 'mydiv', 'WMT', 'Low', 'XOM', 'Low']
('Next day ges=', '0.476')
AvgCor= -0.12480
AllCor= -0.41939
- - X - C X X - C - C - X - X X X X C C C X - X - C X X X X - X - X X X C X X - C - C C - C C X
('Signifigant Accuracy=', 13, '/', 34) SigAcc= 38.24
X X X X C X X C C C C X X X X X X X C C C X C X X C X X X X X X X X X X C X X X C C C C C C C X
('All Accuracy=', 18, '/', 48) AllAcc= 37.50
>>>
```

11/05/2014
Now, to skip ahead a bit: I described how update.py created a data matrix of securities prices, downloaded from YAHOO Finance. What is interesting about this situation is that in later stages, I create "algorithms" or "formulae" based on that data.

The Python 2.7.4 version of Predicto has a pool of 277 algorithms. Each algorithm comes up with a daily forecast of whether the stock market (S&P500) is going to go up or down. We keep track of these algorithms in a file called RR6.REC. Measures called hicor, hiacc, and hiallcor keep track of the algorithms that have been doing the best job of forecasting the market. The symmetrical measures beginning with lo- keep track of the algorithms that have been doing the worst job of forecasting the market.

```
+hicor= 0.517 hicorn= 179 Ges= 0.504
+hiacc= 67.39 pct hiaccn= 59 hiaccn Ges= 0.515
+hiallcor= 0.423 hiallcorn= 51  Ges= 0.481
-locor= -0.493 locorn= 113 Ges= 0.485
-loacc= 28.26 pct loaccn= 79 loaccn Ges= 0.444
-loallcor= -0.447 loallcorn= 111  Ges= 0.537
```

For instance, I am struck by the behavior of Algorithm 79, (AAPL Open) / (SP500 Close - SP500 Low). which out of 46 tries, has been wrong 33 times, or 72% of the time. This kind of inaccuracy can be useful: just reverse the forecast for the day.

I have a Python2 utility called onealgrec, which gives information about one algorithm:

```
AvgCor= -0.04932
AllCor= 0.00291
13 / 46 Acc= 28.26
('Next day ges=', '0.444')
Alg 79 ['mydiv', 'AAPL', 'Open', 'myminus', 'SP500', 'Close', 'SP500', 'Low']
```

Did a second part of the Python2 to Python 3 transfer, a utility program, to access data called getdatum.py. It is less polished than previous update.py. Its output is as follows:

```
0 SP500
1 AAPL
2 CVX
3 F
4 GE
5 IBM
6 T
7 WMT
8 XOM
Enter input number of security you want to study: 4
You chose GE
13303 available dates 2014-11-03 to 1962-01-02
Enter date in form YYYY-MM-DD: 2014-10-29
wantdatestr= 2014-10-29
Found Wednesday 2014-10-29 in line 4
2014-10-29,25.88,25.90,25.39,25.66,28776900,25.66

GE
Date      = 2014-10-29
Open      = 25.88
High      = 25.90
Low       = 25.39
Close     = 25.66
Volume    = 28776900
```

Here is the program:

```
#
import datetime

datasuf=".csv"
tickers=["SP500","AAPL","CVX","F","GE","IBM","T","WMT","XOM"]
week=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

for i,ticker in enumerate(tickers):
print("{0:2d} {1:s}".format(i, ticker))
tickernum=input("Enter input number of security you want to study: ")
ticker=tickers[int(tickernum)]
print("You chose {0}".format(ticker))

f=open(fn, 'rt')
f.close()

dl=l[0]
dsl=dl.split(',')
topdatestr=l[1].rstrip().split(',')[0]
botdatestr=l[-1].rstrip().split(',')[0]
print("{0:d} available dates {1:s} to {2:s}".format(len(l), topdatestr, botdatestr))
wantdatestr=input("Enter date in form YYYY-MM-DD: ")
print("wantdatestr=", wantdatestr)
wantdt=datetime.datetime.strptime(wantdatestr, "%Y-%m-%d")
wdw=wantdt.weekday()

found=False
fsl=()
for index, line in enumerate(l):
if wantdatestr in line:
print("Found {0} {1} in line {2}\n {3}".format(week[wdw], wantdatestr, index, line))
fsl=line.split(',')
found=True
elif  index=len():
if found:
print(ticker)
for i in range(len(fsl)):
print(" {0:9s} = {1}".format(dsl[i].rstrip(), fsl[i]))

```

11/03/2014
In first move, transferred old UPDATE.PY function from Python27 to Python34. Seems to work, producing CSV (comma-separated-value) text files. Had to change import file list and all print functions. Especially noteworthy is the use of the ".format" string, especially:

print("{0:5s} MktDay= {1:s} Len= {2:5d} Ch={3:5.2f}".format(ticker, sl1[0], len(l), ch))
The items in curly brackets are numbered placeholders and format specifieers for values to be specified between parentheses in the ".format()". This was an awkward syntax to me -- especially since the distinction between curly brackets and parentheses is slight on my screen. But maybe I have got it through my thick head now....
```
import urllib.request

tickers=["SP500","AAPL","CVX","F","GE","IBM","T","WMT","XOM"]
datasuf=".csv"
base_url = "http://ichart.finance.yahoo.com/table.csv?s="

print("1. Update")
for ticker in tickers:
print("{0:s}".format(ticker))
if ticker=="SP500":
nuticker="^GSPC"
else:
nuticker=ticker
ytarg=base_url+nuticker
out=urllib.request.urlretrieve(ytarg,outfile)

print("2. Checkupdate")
for ticker in tickers:
f=open(fn, 'rt')
sl1=l[1].rstrip().split(',')
sl0=l[2].rstrip().split(',')
ch=float(sl1[4])-float(sl0[4])
print("{0:5s} MktDay= {1:s} Len= {2:5d} Ch={3:5.2f}".format(ticker, sl1[0], len(l), ch))
f.close()

```

The first part of the program creates nine files in the predata directory, one for each of S&P500, Apple, Chevron, Ford, GE, IBM, AT&T, Walmart, and Exxon. These files are in the following form (using the example of AAPL:

```
2014-11-03,108.22,110.30,108.01,109.40,52198000,109.40
2014-10-31,108.01,108.04,107.21,108.00,44571200,108.00
2014-10-30,106.96,107.35,105.90,106.98,40589700,106.98
2014-10-29,106.65,107.37,106.36,107.34,52586100,107.34
2014-10-28,105.40,106.74,105.35,106.74,47939900,106.74
2014-10-27,104.85,105.48,104.70,105.11,34132600,105.11
2014-10-24,105.18,105.49,104.53,105.22,46981700,105.22
2014-10-23,104.08,105.05,103.63,104.83,71002900,104.83
2014-10-22,102.84,104.11,102.60,102.99,68159000,102.99
...
1980-12-16,25.37,25.37,25.25,25.25,26432000,0.39
1980-12-15,27.38,27.38,27.25,27.25,43971200,0.42
1980-12-12,28.75,28.87,28.75,28.75,117258400,0.45
```
The second part of the program checks the update, as follows:
```
2. Checkupdate
SP500 MktDay= 2014-11-03 Len= 16316 Ch=-0.24
AAPL  MktDay= 2014-11-03 Len=  8549 Ch= 1.40
CVX   MktDay= 2014-11-03 Len= 11316 Ch=-3.17
F     MktDay= 2014-11-03 Len= 10704 Ch=-0.10
GE    MktDay= 2014-11-03 Len= 13303 Ch=-0.11
IBM   MktDay= 2014-11-03 Len= 13303 Ch=-0.04
T     MktDay= 2014-11-03 Len=  7640 Ch= 0.00
WMT   MktDay= 2014-11-03 Len= 10644 Ch= 0.01
XOM   MktDay= 2014-11-03 Len= 11316 Ch=-1.45
```

11/01/2014
Have actually written a partial Predicto implementation, in Python. Finished it 9/2/2014, but not sure how to proceed. In a lazy Saturday devoted to binge-TV-watching, came up with a plan to cover Predicto by transfering it from its current Python 2.7.4 version to the latest Python 3.4.0. See links below.

Python 3.0 Transfer

Some sources on new Python 3.4.0: