You may want to look at this documentation:
You can also use the FANN library for your experiments.If youre results are poor (I got around 30% accuracy with the default values), then include more hidden layers (10,10,10 in the hiddenLayers field), a slower learning rate, more samples etc.
You should use more than one protein from pdb to make sure your network is not specific to a single protein. You can look proteins up at pdb or NCBI Structure . At pdb you will want want to click on the Sequence tab and down in the Chain Display Section you will find a link to [Sequence & DSSP]. Copy and paste the results(minus the title at the top) into a pdb file. You can find secondary structure annotations for all PDB structures at http://www.rcsb.org/pdb/files/ss.txt. A copy of this file is in /usr/local/data/ss.txt on psoda4. If you dont want to go through the anotations by hand, you can use the stride application to retreive the annotations from pdb files.
Another method for making your arff files is to use the ss.txt file referred to above. Copy and paste a few of the sequence and structure pairs into a file. An example file of this is ss1.txt. Don't mess with their formatting. Then run the command "perl ssparser.pl ss1.txt ss1.arff" to create an arff file. You are free to edit the ssparser file to change the number of inputs. The ss.txt file is huge so you can use it to get a lot of training data.
The pdb files will have the following symbols: H=helix; B=residue in isolated beta bridge; E=extended beta strand; G=310 helix; I=pi helix; T=hydrogen bonded turn; S=bend; N=Nothing (my perl code inserts this into the arff file)Amino Acids A,R,D,N,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y,V