                         SEQUENCE LISTING

<110>  California Institute of Technology
       BASF SE
 
<120>  Systems and methods for generating a signal peptide amino acid 
       sequence using deep learning

<130>  039621.00819

<140>  PCT/US2021/035990
<141>  2021-06-04

<150>  US 63/034,802
<151>  2020-06-04

<160>  7     

<170>  PatentIn version 3.5

<210>  1
<211>  442
<212>  PRT
<213>  Artificial Sequence

<220>
<223>  N-terminal Signal Peptide
<400>  1

Ala Glu Arg Gln Pro Leu Lys Ile Pro Pro Ile Ile Asp Val Gly Arg 
1               5                   10                  15      


Gly Arg Pro Val Arg Leu Asp Leu Arg Pro Ala Gln Thr Gln Phe Asp 
            20                  25                  30          


Lys Gly Lys Leu Val Asp Val Trp Gly Val Asn Gly Gln Tyr Leu Ala 
        35                  40                  45              


Pro Thr Val Arg Val Lys Ser Asp Asp Phe Val Lys Leu Thr Tyr Val 
    50                  55                  60                  


Asn Asn Leu Pro Gln Thr Val Thr Met Asn Ile Gln Gly Leu Leu Ala 
65                  70                  75                  80  


Pro Thr Asp Met Ile Gly Ser Ile His Arg Lys Leu Glu Ala Lys Ser 
                85                  90                  95      


Ser Trp Ser Pro Ile Ile Ser Ile His Gln Pro Ala Cys Thr Cys Trp 
            100                 105                 110         


Tyr His Ala Asp Thr Met Leu Asn Ser Ala Phe Gln Ile Tyr Arg Gly 
        115                 120                 125             


Leu Ala Gly Met Trp Ile Ile Glu Asp Glu Gln Ser Lys Lys Ala Asn 
    130                 135                 140                 


Leu Pro Asn Lys Tyr Gly Val Asn Asp Ile Pro Leu Ile Leu Gln Asp 
145                 150                 155                 160 


Gln Gln Leu Asn Lys Gln Gly Val Gln Val Leu Asp Ala Asn Gln Lys 
                165                 170                 175     


Gln Phe Phe Gly Lys Arg Leu Phe Val Asn Gly Gln Glu Ser Ala Tyr 
            180                 185                 190         


His Gln Val Ala Arg Gly Trp Val Arg Leu Arg Ile Val Asn Ala Ser 
        195                 200                 205             


Leu Ser Arg Pro Tyr Gln Leu Arg Leu Asp Asn Asp Gln Pro Leu His 
    210                 215                 220                 


Leu Ile Ala Thr Gly Val Gly Met Leu Ala Glu Pro Val Pro Leu Glu 
225                 230                 235                 240 


Ser Ile Thr Leu Ala Pro Ser Glu Arg Val Glu Val Leu Val Glu Leu 
                245                 250                 255     


Asn Glu Gly Lys Thr Val Ser Leu Ile Ser Gly Gln Lys Arg Asp Ile 
            260                 265                 270         


Phe Tyr Gln Ala Lys Asn Leu Phe Ser Asp Asp Asn Glu Leu Thr Asp 
        275                 280                 285             


Asn Val Ile Leu Glu Leu Arg Pro Glu Gly Met Ala Ala Val Phe Ser 
    290                 295                 300                 


Asn Lys Pro Ser Leu Pro Pro Phe Ala Thr Glu Asp Phe Gln Leu Lys 
305                 310                 315                 320 


Ile Ala Glu Glu Arg Arg Leu Ile Ile Arg Pro Phe Asp Arg Leu Ile 
                325                 330                 335     


Asn Gln Lys Arg Phe Asp Pro Lys Arg Ile Asp Phe Asn Val Lys Gln 
            340                 345                 350         


Gly Asn Val Glu Arg Trp Tyr Ile Thr Ser Asp Glu Ala Val Gly Phe 
        355                 360                 365             


Thr Leu Gln Gly Ala Lys Phe Leu Ile Glu Thr Arg Asn Arg Gln Arg 
    370                 375                 380                 


Leu Pro His Lys Gln Pro Ala Trp His Asp Thr Val Trp Leu Glu Lys 
385                 390                 395                 400 


Asn Gln Glu Val Thr Leu Leu Val Arg Phe Asp His Gln Ala Ser Ala 
                405                 410                 415     


Gln Leu Pro Phe Thr Phe Gly Val Ser Asp Phe Met Leu Arg Asp Arg 
            420                 425                 430         


Gly Ala Met Gly Gln Phe Ile Val Thr Glu 
        435                 440         


<210>  2
<211>  28
<212>  PRT
<213>  Artificial Sequence

<220>
<223>  N-terminal Signal Peptide
<400>  2

Met Met Asn Leu Thr Arg Arg Gln Leu Leu Thr Arg Ser Ala Val Ala 
1               5                   10                  15      


Ala Thr Met Phe Ser Ala Pro Lys Thr Leu Trp Ala 
            20                  25              


<210>  3
<211>  344
<212>  PRT
<213>  Artificial Sequence

<220>
<223>  N-terminal Signal Peptide
<400>  3

Glu Arg Ile Lys Asp Leu Thr Thr Ile Gln Gly Val Arg Ser Asn Gln 
1               5                   10                  15      


Leu Ile Gly Tyr Gly Leu Val Val Gly Leu Asp Gly Thr Gly Asp Gln 
            20                  25                  30          


Thr Thr Gln Thr Pro Phe Thr Val Gln Ser Ile Val Ser Met Met Gln 
        35                  40                  45              


Gln Met Gly Ile Asn Leu Pro Ser Gly Thr Asn Leu Gln Leu Arg Asn 
    50                  55                  60                  


Val Ala Ala Val Met Val Thr Gly Asn Leu Pro Pro Phe Ala Gln Pro 
65                  70                  75                  80  


Gly Gln Pro Met Asp Val Thr Val Ser Ser Met Gly Asn Ala Arg Ser 
                85                  90                  95      


Leu Arg Gly Gly Thr Leu Leu Met Thr Pro Leu Lys Gly Ala Asp Asn 
            100                 105                 110         


Gln Val Tyr Ala Met Ala Gln Gly Asn Leu Val Ile Gly Gly Ala Gly 
        115                 120                 125             


Ala Gly Ala Ser Gly Thr Ser Thr Gln Ile Asn His Leu Gly Ala Gly 
    130                 135                 140                 


Arg Ile Ser Ala Gly Ala Ile Val Glu Arg Ala Val Pro Ser Gln Leu 
145                 150                 155                 160 


Thr Glu Thr Ser Thr Ile Arg Leu Glu Leu Lys Glu Ala Asp Phe Ser 
                165                 170                 175     


Thr Ala Ser Met Val Val Asp Ala Ile Asn Lys Arg Phe Gly Asn Gly 
            180                 185                 190         


Thr Ala Thr Pro Leu Asp Gly Arg Val Ile Gln Val Gln Pro Pro Met 
        195                 200                 205             


Asp Ile Asn Arg Ile Ala Phe Ile Gly Asn Leu Glu Asn Leu Asp Val 
    210                 215                 220                 


Lys Pro Ser Gln Gly Pro Ala Lys Val Ile Leu Asn Ala Arg Thr Gly 
225                 230                 235                 240 


Ser Val Val Met Asn Gln Ala Val Thr Leu Asp Asp Cys Ala Ile Ser 
                245                 250                 255     


His Gly Asn Leu Ser Val Val Ile Asn Thr Ala Pro Ala Ile Ser Gln 
            260                 265                 270         


Pro Gly Pro Phe Ser Gly Gly Gln Thr Val Ala Thr Gln Val Ser Gln 
        275                 280                 285             


Val Glu Ile Asn Lys Glu Pro Gly Gln Val Ile Lys Leu Asp Lys Gly 
    290                 295                 300                 


Thr Ser Leu Ala Asp Val Val Lys Ala Leu Asn Ala Ile Gly Ala Thr 
305                 310                 315                 320 


Pro Gln Asp Leu Val Ala Ile Leu Gln Ala Met Lys Ala Ala Gly Ser 
                325                 330                 335     


Leu Arg Ala Asp Leu Glu Ile Ile 
            340                 


<210>  4
<211>  24
<212>  PRT
<213>  Artificial Sequence

<220>
<223>  N-terminal Signal Peptide
<400>  4

Met Thr Leu Thr Arg Pro Leu Ala Leu Ile Ser Ala Leu Ala Ala Leu 
1               5                   10                  15      


Ile Leu Ala Leu Pro Ala Asp Ala 
            20                  


<210>  5
<211>  100
<212>  PRT
<213>  Artificial Sequence

<220>
<223>  N-terminal Signal Peptide
<400>  5

Asp Gly Leu Asn Gly Thr Met Met Gln Tyr Tyr Glu Trp His Leu Glu 
1               5                   10                  15      


Asn Asp Gly Gln His Trp Asn Arg Leu His Asp Asp Ala Ala Ala Leu 
            20                  25                  30          


Ser Asp Ala Gly Ile Thr Ala Ile Trp Ile Pro Pro Ala Tyr Lys Gly 
        35                  40                  45              


Asn Ser Gln Ala Asp Val Gly Tyr Gly Ala Tyr Asp Leu Tyr Asp Leu 
    50                  55                  60                  


Gly Glu Phe Asn Gln Lys Gly Thr Val Arg Thr Lys Tyr Gly Thr Lys 
65                  70                  75                  80  


Ala Gln Leu Glu Arg Ala Ile Gly Ser Leu Lys Ser Asn Asp Ile Asn 
                85                  90                  95      


Val Tyr Gly Asp 
            100 


<210>  6
<211>  16
<212>  PRT
<213>  Artificial Sequence

<220>
<223>  N-terminal Signal Peptide
<400>  6

Met Lys Leu Leu Thr Ser Phe Val Leu Ile Gly Ala Leu Ala Phe Ala 
1               5                   10                  15      


<210>  7
<211>  116
<212>  PRT
<213>  Artificial Sequence

<220>
<223>  N-terminal Signal Peptide
<400>  7

Met Lys Leu Leu Thr Ser Phe Val Leu Ile Gly Ala Leu Ala Phe Ala 
1               5                   10                  15      


Asp Gly Leu Asn Gly Thr Met Met Gln Tyr Tyr Glu Trp His Leu Glu 
            20                  25                  30          


Asn Asp Gly Gln His Trp Asn Arg Leu His Asp Asp Ala Ala Ala Leu 
        35                  40                  45              


Ser Asp Ala Gly Ile Thr Ala Ile Trp Ile Pro Pro Ala Tyr Lys Gly 
    50                  55                  60                  


Asn Ser Gln Ala Asp Val Gly Tyr Gly Ala Tyr Asp Leu Tyr Asp Leu 
65                  70                  75                  80  


Gly Glu Phe Asn Gln Lys Gly Thr Val Arg Thr Lys Tyr Gly Thr Lys 
                85                  90                  95      


Ala Gln Leu Glu Arg Ala Ile Gly Ser Leu Lys Ser Asn Asp Ile Asn 
            100                 105                 110         


Val Tyr Gly Asp 
        115     


