G2Cdb::Gene report

Gene id
Gene symbol
Homo sapiens
SH3-domain GRB2-like (endophilin) interacting protein 1
G00000827 (Mus musculus)

Databases (7)

Curated Gene
OTTHUMG00000009161 (Vega human gene)
ENSG00000118473 (Ensembl human gene)
84251 (Entrez Gene)
1258 (G2Cdb plasticity & disease)
SGIP1 (GeneCards)
Marker Symbol
HGNC:25412 (HGNC)
Protein Sequence
Q9BQI5 (UniProt)

Synonyms (1)

  • DKFZp761D221

Literature (15)

Pubmed - other

  • Syp1 is a conserved endocytic adaptor that contains domains involved in cargo selection and membrane tubulation.

    Reider A, Barker SL, Mishra SK, Im YJ, Maldonado-Báez L, Hurley JH, Traub LM and Wendland B

    Department of Biology, The Johns Hopkins University, Baltimore, MD 21218-2685, USA.

    Internalization of diverse transmembrane cargos from the plasma membrane requires a similarly diverse array of specialized adaptors, yet only a few adaptors have been characterized. We report the identification of the muniscin family of endocytic adaptors that is conserved from yeast to human beings. Solving the structures of yeast muniscin domains confirmed the unique combination of an N-terminal domain homologous to the crescent-shaped membrane-tubulating EFC/F-BAR domains and a C-terminal domain homologous to cargo-binding mu homology domains (muHDs). In vitro and in vivo assays confirmed membrane-tubulation activity for muniscin EFC/F-BAR domains. The muHD domain has conserved interactions with the endocytic adaptor/scaffold Ede1/eps15, which influences muniscin localization. The transmembrane protein Mid2, earlier implicated in polarized Rho1 signalling, was identified as a cargo of the yeast adaptor protein. These and other data suggest a model in which the muniscins provide a combined adaptor/membrane-tubulation activity that is important for regulating endocytosis.

    Funded by: Intramural NIH HHS; NIDDK NIH HHS: R01 DK053249, R01 DK53249; NIGMS NIH HHS: GM07231, R01 GM060979, R01 GM60979, T32 GM007231

    The EMBO journal 2009;28;20;3103-16

  • Gene variants associated with ischemic stroke: the cardiovascular health study.

    Luke MM, O'Meara ES, Rowland CM, Shiffman D, Bare LA, Arellano AR, Longstreth WT, Lumley T, Rice K, Tracy RP, Devlin JJ and Psaty BM

    Celera, Alameda, California 94502, USA. may.luke@celera.com

    The purpose of this study was to determine whether 74 single nucleotide polymorphisms (SNPs), which had been associated with coronary heart disease, are associated with incident ischemic stroke.

    Methods: Based on antecedent studies of coronary heart disease, we prespecified the risk allele for each of the 74 SNPs. We used Cox proportional hazards models that adjusted for traditional risk factors to estimate the associations of these SNPs with incident ischemic stroke during 14 years of follow-up in a population-based study of older adults: the Cardiovascular Health Study (CHS).

    Results: In white CHS participants, the prespecified risk alleles of 7 of the 74 SNPs (in HPS1, ITGAE, ABCG2, MYH15, FSTL4, CALM1, and BAT2) were nominally associated with increased risk of stroke (one-sided P<0.05, false discovery rate=0.42). In black participants, the prespecified risk alleles of 5 SNPs (in KRT4, LY6G5B, EDG1, DMXL2, and ABCG2) were nominally associated with stroke (one-sided P<0.05, false discovery rate=0.55). The Val12Met SNP in ABCG2 was associated with stroke in both white (hazard ratio, 1.46; 90% CI, 1.05 to 2.03) and black (hazard ratio, 3.59; 90% CI, 1.11 to 11.6) participants of CHS. Kaplan-Meier estimates of the 10-year cumulative incidence of stroke were greater among Val allele homozygotes than among Met allele carriers in both white (10% versus 6%) and black (12% versus 3%) participants of CHS.

    Conclusions: The Val12Met SNP in ABCG2 (encoding a transporter of sterols and xenobiotics) was associated with incident ischemic stroke in white and black participants of CHS.

    Funded by: NHLBI NIH HHS: N01 HC-55222, N01 HC015103, N01 HC035129, N01 HC045133, N01-HC-75150, N01-HC-85079, N01-HC-85086, N01HC55222, N01HC75150, N01HC85079, N01HC85086, U01 HL080295, U01 HL080295-01

    Stroke 2009;40;2;363-8

  • Association of gene variants with incident myocardial infarction in the Cardiovascular Health Study.

    Shiffman D, O'Meara ES, Bare LA, Rowland CM, Louie JZ, Arellano AR, Lumley T, Rice K, Iakoubova O, Luke MM, Young BA, Malloy MJ, Kane JP, Ellis SG, Tracy RP, Devlin JJ and Psaty BM

    Celera, 1401 Harbor Bay Parkway, Alameda, CA 94502, USA. dov.shiffman@celera.com

    Objective: We asked whether single nucleotide polymorphisms (SNPs) that had been nominally associated with cardiovascular disease in antecedent studies were also associated with cardiovascular disease in a population-based prospective study of 4522 individuals aged 65 or older.

    Based on antecedent studies, we prespecified a risk allele and an inheritance model for each of 74 SNPs. We then tested the association of these SNPs with myocardial infarction (MI) in the Cardiovascular Health Study (CHS). The prespecified risk alleles of 8 SNPs were nominally associated (1-sided P<0.05) with increased risk of MI in White CHS participants. The false discovery rate for these 8 was 0.43, suggesting that about 4 of these 8 are likely to be true positives. The 4 of these 8 SNPs that had the strongest evidence for association with cardiovascular disease before testing in CHS (association in 3 antecedent studies) were in KIF6 (CHS HR=1.29; 90%CI 1.1 to 1.52), VAMP8 (HR=1.2; 90%CI 1.02 to 1.41), TAS2R50 (HR=1.13; 90%CI 1 to 1.27), and LPA (HR=1.62; 90%CI 1.09 to 2.42).

    Conclusions: Although most of the SNPs investigated were not associated with MI in CHS, evidence from this investigation combined with previous studies suggests that 4 of these SNPs are likely associated with MI.

    Funded by: NHLBI NIH HHS: K08 HL077499, N01 HC015103, N01 HC035129, N01 HC045133, N01-HC-55222, N01-HC-75150, N01-HC-85079, N01-HC-85080, N01-HC-85081, N01-HC-85082, N01-HC-85083, N01-HC-85084, N01-HC-85085, N01-HC-85086, N01HC55222, N01HC75150, N01HC85079, N01HC85086, R01 HL077499, U01 HL080295, U01 HL080295-01, U01 HL080295-02, U01 HL080295-03, U01 HL080295-04

    Arteriosclerosis, thrombosis, and vascular biology 2008;28;1;173-9

  • SGIP1alpha is an endocytic protein that directly interacts with phospholipids and Eps15.

    Uezu A, Horiuchi A, Kanda K, Kikuchi N, Umeda K, Tsujita K, Suetsugu S, Araki N, Yamamoto H, Takenawa T and Nakanishi H

    Department of Molecular Pharmacology, Graduate School of Medical Sciences, Kumamoto University, 1-1-1 Honjo, Kumamoto 860-8556, Japan.

    SGIP1 has been shown to be an endophilin-interacting protein that regulates energy balance, but its function is not fully understood. Here, we identified its splicing variant of SGIP1 and named it SGIP1alpha. SGIP1alpha bound to phosphatidylserine and phosphoinositides and deformed the plasma membrane and liposomes into narrow tubules, suggesting the involvement in vesicle formation during endocytosis. SGIP1alpha furthermore bound to Eps15, an important adaptor protein of clathrin-mediated endocytic machinery. SGIP1alpha was colocalized with Eps15 and the AP-2 complex. Upon epidermal growth factor (EGF) stimulation, SGIP1alpha was colocalized with EGF at the plasma membrane, indicating the localization of SGIP1alpha at clathrin-coated pits/vesicles. SGIP1alpha overexpression reduced transferrin and EGF endocytosis. SGIP1alpha knockdown reduced transferrin endocytosis but not EGF endocytosis; this difference may be due to the presence of redundant pathways in EGF endocytosis. These results suggest that SGIP1alpha plays an essential role in clathrin-mediated endocytosis by interacting with phospholipids and Eps15.

    The Journal of biological chemistry 2007;282;36;26481-9

  • The DNA sequence and biological annotation of human chromosome 1.

    Gregory SG, Barlow KF, McLay KE, Kaul R, Swarbreck D, Dunham A, Scott CE, Howe KL, Woodfine K, Spencer CC, Jones MC, Gillson C, Searle S, Zhou Y, Kokocinski F, McDonald L, Evans R, Phillips K, Atkinson A, Cooper R, Jones C, Hall RE, Andrews TD, Lloyd C, Ainscough R, Almeida JP, Ambrose KD, Anderson F, Andrew RW, Ashwell RI, Aubin K, Babbage AK, Bagguley CL, Bailey J, Beasley H, Bethel G, Bird CP, Bray-Allen S, Brown JY, Brown AJ, Buckley D, Burton J, Bye J, Carder C, Chapman JC, Clark SY, Clarke G, Clee C, Cobley V, Collier RE, Corby N, Coville GJ, Davies J, Deadman R, Dunn M, Earthrowl M, Ellington AG, Errington H, Frankish A, Frankland J, French L, Garner P, Garnett J, Gay L, Ghori MR, Gibson R, Gilby LM, Gillett W, Glithero RJ, Grafham DV, Griffiths C, Griffiths-Jones S, Grocock R, Hammond S, Harrison ES, Hart E, Haugen E, Heath PD, Holmes S, Holt K, Howden PJ, Hunt AR, Hunt SE, Hunter G, Isherwood J, James R, Johnson C, Johnson D, Joy A, Kay M, Kershaw JK, Kibukawa M, Kimberley AM, King A, Knights AJ, Lad H, Laird G, Lawlor S, Leongamornlert DA, Lloyd DM, Loveland J, Lovell J, Lush MJ, Lyne R, Martin S, Mashreghi-Mohammadi M, Matthews L, Matthews NS, McLaren S, Milne S, Mistry S, Moore MJ, Nickerson T, O'Dell CN, Oliver K, Palmeiri A, Palmer SA, Parker A, Patel D, Pearce AV, Peck AI, Pelan S, Phelps K, Phillimore BJ, Plumb R, Rajan J, Raymond C, Rouse G, Saenphimmachak C, Sehra HK, Sheridan E, Shownkeen R, Sims S, Skuce CD, Smith M, Steward C, Subramanian S, Sycamore N, Tracey A, Tromans A, Van Helmond Z, Wall M, Wallis JM, White S, Whitehead SL, Wilkinson JE, Willey DL, Williams H, Wilming L, Wray PW, Wu Z, Coulson A, Vaudin M, Sulston JE, Durbin R, Hubbard T, Wooster R, Dunham I, Carter NP, McVean G, Ross MT, Harrow J, Olson MV, Beck S, Rogers J, Bentley DR, Banerjee R, Bryant SP, Burford DC, Burrill WD, Clegg SM, Dhami P, Dovey O, Faulkner LM, Gribble SM, Langford CF, Pandian RD, Porter KM and Prigmore E

    The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. sgregory@chg.duhs.duke.edu

    The reference sequence for each human chromosome provides the framework for understanding genome function, variation and evolution. Here we report the finished sequence and biological annotation of human chromosome 1. Chromosome 1 is gene-dense, with 3,141 genes and 991 pseudogenes, and many coding sequences overlap. Rearrangements and mutations of chromosome 1 are prevalent in cancer and many other diseases. Patterns of sequence variation reveal signals of recent selection in specific genes that may contribute to human fitness, and also in regions where no function is evident. Fine-scale recombination occurs in hotspots of varying intensity along the sequence, and is enriched near genes. These and other studies of human biology and disease encoded within chromosome 1 are made possible with the highly accurate annotated sequence, as part of the completed set of chromosome sequences that comprise the reference human genome.

    Funded by: Medical Research Council: G0000107; Wellcome Trust

    Nature 2006;441;7091;315-21

  • The LIFEdb database in 2006.

    Mehrle A, Rosenfelder H, Schupp I, del Val C, Arlt D, Hahne F, Bechtel S, Simpson J, Hofmann O, Hide W, Glatting KH, Huber W, Pepperkok R, Poustka A and Wiemann S

    Division Molecular Genome Analysis, German Cancer Research Center, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany. a.mehrle@dkfz.de

    LIFEdb (http://www.LIFEdb.de) integrates data from large-scale functional genomics assays and manual cDNA annotation with bioinformatics gene expression and protein analysis. New features of LIFEdb include (i) an updated user interface with enhanced query capabilities, (ii) a configurable output table and the option to download search results in XML, (iii) the integration of data from cell-based screening assays addressing the influence of protein-overexpression on cell proliferation and (iv) the display of the relative expression ('Electronic Northern') of the genes under investigation using curated gene expression ontology information. LIFEdb enables researchers to systematically select and characterize genes and proteins of interest, and presents data and information via its user-friendly web-based interface.

    Nucleic acids research 2006;34;Database issue;D415-8

  • Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes.

    Kimura K, Wakamatsu A, Suzuki Y, Ota T, Nishikawa T, Yamashita R, Yamamoto J, Sekine M, Tsuritani K, Wakaguri H, Ishii S, Sugiyama T, Saito K, Isono Y, Irie R, Kushida N, Yoneyama T, Otsuka R, Kanda K, Yokoi T, Kondo H, Wagatsuma M, Murakawa K, Ishida S, Ishibashi T, Takahashi-Fujii A, Tanase T, Nagai K, Kikuchi H, Nakai K, Isogai T and Sugano S

    Life Science Research Laboratory, Central Research Laboratory, Hitachi, Ltd., Kokubunji, Tokyo, 185-8601, Japan.

    By analyzing 1,780,295 5'-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by more than 500 bp and thus are very likely to constitute mutually distinct alternative promoters. To our surprise, at least 7674 (52%) human RefSeq genes were subject to regulation by putative alternative promoters (PAPs). On average, there were 3.1 PAPs per gene, with the composition of one CpG-island-containing promoter per 2.6 CpG-less promoters. In 17% of the PAP-containing loci, tissue-specific use of the PAPs was observed. The richest tissue sources of the tissue-specific PAPs were testis and brain. It was also intriguing that the PAP-containing promoters were enriched in the genes encoding signal transduction-related proteins and were rarer in the genes encoding extracellular proteins, possibly reflecting the varied functional requirement for and the restricted expression of those categories of genes, respectively. The patterns of the first exons were highly diverse as well. On average, there were 7.7 different splicing types of first exons per locus partly produced by the PAPs, suggesting that a wide variety of transcripts can be achieved by this mechanism. Our findings suggest that use of alternate promoters and consequent alternative use of first exons should play a pivotal role in generating the complexity required for the highly elaborated molecular systems in humans.

    Genome research 2006;16;1;55-65

  • Src homology 3-domain growth factor receptor-bound 2-like (endophilin) interacting protein 1, a novel neuronal protein that regulates energy balance.

    Trevaskis J, Walder K, Foletta V, Kerr-Bayles L, McMillan J, Cooper A, Lee S, Bolton K, Prior M, Fahey R, Whitecross K, Morton GJ, Schwartz MW and Collier GR

    Metabolic Research Unit, School of Exercise and Nutrition Sciences, Deakin University, Geelong 3217, Victoria, Australia.

    To identify genes involved in the central regulation of energy balance, we compared hypothalamic mRNA from lean and obese Psammomys obesus, a polygenic model of obesity, using differential display PCR. One mRNA transcript was observed to be elevated in obese, and obese diabetic, P. obesus compared with lean animals and was subsequently found to be increased 4-fold in the hypothalamus of lethal yellow agouti (A(y)/a) mice, a murine model of obesity and diabetes. Intracerebroventricular infusion of antisense oligonucleotide targeted to this transcript selectively suppressed its hypothalamic mRNA levels and resulted in loss of body weight in both P. obesus and Sprague Dawley rats. Reductions in body weight were mediated by profoundly reduced food intake without a concomitant reduction in metabolic rate. Yeast two-hybrid screening, and confirmation in mammalian cells by bioluminescence resonance energy transfer analysis, demonstrated that the protein it encodes interacts with endophilins, mediators of synaptic vesicle recycling and receptor endocytosis in the brain. We therefore named this transcript Src homology 3-domain growth factor receptor-bound 2-like (endophilin) interacting protein 1 (SGIP1). SGIP1 encodes a large proline-rich protein that is expressed predominantly in the brain and is highly conserved between species. Together these data suggest that SGIP1 is an important and novel member of the group of neuronal molecules required for the regulation of energy homeostasis.

    Funded by: NIDDK NIH HHS: DK12829, DK52989, DK68304; NINDS NIH HHS: NS32273

    Endocrinology 2005;146;9;3757-64

  • The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC).

    Gerhard DS, Wagner L, Feingold EA, Shenmen CM, Grouse LH, Schuler G, Klein SL, Old S, Rasooly R, Good P, Guyer M, Peck AM, Derge JG, Lipman D, Collins FS, Jang W, Sherry S, Feolo M, Misquitta L, Lee E, Rotmistrovsky K, Greenhut SF, Schaefer CF, Buetow K, Bonner TI, Haussler D, Kent J, Kiekhaus M, Furey T, Brent M, Prange C, Schreiber K, Shapiro N, Bhat NK, Hopkins RF, Hsie F, Driscoll T, Soares MB, Casavant TL, Scheetz TE, Brown-stein MJ, Usdin TB, Toshiyuki S, Carninci P, Piao Y, Dudekula DB, Ko MS, Kawakami K, Suzuki Y, Sugano S, Gruber CE, Smith MR, Simmons B, Moore T, Waterman R, Johnson SL, Ruan Y, Wei CL, Mathavan S, Gunaratne PH, Wu J, Garcia AM, Hulyk SW, Fuh E, Yuan Y, Sneed A, Kowis C, Hodgson A, Muzny DM, McPherson J, Gibbs RA, Fahey J, Helton E, Ketteman M, Madan A, Rodrigues S, Sanchez A, Whiting M, Madari A, Young AC, Wetherby KD, Granite SJ, Kwong PN, Brinkley CP, Pearson RL, Bouffard GG, Blakesly RW, Green ED, Dickson MC, Rodriguez AC, Grimwood J, Schmutz J, Myers RM, Butterfield YS, Griffith M, Griffith OL, Krzywinski MI, Liao N, Morin R, Morrin R, Palmquist D, Petrescu AS, Skalska U, Smailus DE, Stott JM, Schnerch A, Schein JE, Jones SJ, Holt RA, Baross A, Marra MA, Clifton S, Makowski KA, Bosak S, Malek J and MGC Project Team

    The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5'-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline.

    Funded by: PHS HHS: N01-C0-12400

    Genome research 2004;14;10B;2121-7

  • From ORFeome to biology: a functional genomics pipeline.

    Wiemann S, Arlt D, Huber W, Wellenreuther R, Schleeger S, Mehrle A, Bechtel S, Sauermann M, Korf U, Pepperkok R, Sültmann H and Poustka A

    Molecular Genome Analysis, German Cancer Research Center, 69120 Heidelberg, Germany. s.wiemann@dkfz.de

    As several model genomes have been sequenced, the elucidation of protein function is the next challenge toward the understanding of biological processes in health and disease. We have generated a human ORFeome resource and established a functional genomics and proteomics analysis pipeline to address the major topics in the post-genome-sequencing era: the identification of human genes and splice forms, and the determination of protein localization, activity, and interaction. Combined with the understanding of when and where gene products are expressed in normal and diseased conditions, we create information that is essential for understanding the interplay of genes and proteins in the complex biological network. We have implemented bioinformatics tools and databases that are suitable to store, analyze, and integrate the different types of data from high-throughput experiments and to include further annotation that is based on external information. All information is presented in a Web database (http://www.dkfz.de/LIFEdb). It is exploited for the identification of disease-relevant genes and proteins for diagnosis and therapy.

    Genome research 2004;14;10B;2136-44

  • Complete sequencing and characterization of 21,243 full-length human cDNAs.

    Ota T, Suzuki Y, Nishikawa T, Otsuki T, Sugiyama T, Irie R, Wakamatsu A, Hayashi K, Sato H, Nagai K, Kimura K, Makita H, Sekine M, Obayashi M, Nishi T, Shibahara T, Tanaka T, Ishii S, Yamamoto J, Saito K, Kawai Y, Isono Y, Nakamura Y, Nagahari K, Murakami K, Yasuda T, Iwayanagi T, Wagatsuma M, Shiratori A, Sudo H, Hosoiri T, Kaku Y, Kodaira H, Kondo H, Sugawara M, Takahashi M, Kanda K, Yokoi T, Furuya T, Kikkawa E, Omura Y, Abe K, Kamihara K, Katsuta N, Sato K, Tanikawa M, Yamazaki M, Ninomiya K, Ishibashi T, Yamashita H, Murakawa K, Fujimori K, Tanai H, Kimata M, Watanabe M, Hiraoka S, Chiba Y, Ishida S, Ono Y, Takiguchi S, Watanabe S, Yosida M, Hotuta T, Kusano J, Kanehori K, Takahashi-Fujii A, Hara H, Tanase TO, Nomura Y, Togiya S, Komai F, Hara R, Takeuchi K, Arita M, Imose N, Musashino K, Yuuki H, Oshima A, Sasaki N, Aotsuka S, Yoshikawa Y, Matsunawa H, Ichihara T, Shiohata N, Sano S, Moriya S, Momiyama H, Satoh N, Takami S, Terashima Y, Suzuki O, Nakagawa S, Senoh A, Mizoguchi H, Goto Y, Shimizu F, Wakebe H, Hishigaki H, Watanabe T, Sugiyama A, Takemoto M, Kawakami B, Yamazaki M, Watanabe K, Kumagai A, Itakura S, Fukuzumi Y, Fujimori Y, Komiyama M, Tashiro H, Tanigami A, Fujiwara T, Ono T, Yamada K, Fujii Y, Ozaki K, Hirao M, Ohmori Y, Kawabata A, Hikiji T, Kobatake N, Inagaki H, Ikema Y, Okamoto S, Okitani R, Kawakami T, Noguchi S, Itoh T, Shigeta K, Senba T, Matsumura K, Nakajima Y, Mizuno T, Morinaga M, Sasaki M, Togashi T, Oyama M, Hata H, Watanabe M, Komatsu T, Mizushima-Sugano J, Satoh T, Shirai Y, Takahashi Y, Nakagawa K, Okumura K, Nagase T, Nomura N, Kikuchi H, Masuho Y, Yamashita R, Nakai K, Yada T, Nakamura Y, Ohara O, Isogai T and Sugano S

    Helix Research Institute, 1532-3 Yana, Kisarazu, Chiba 292-0812, Japan.

    As a base for human transcriptome and functional genomics, we created the "full-length long Japan" (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at approximately 58% compared with a peak at approximately 42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at approximately 42%, relatively low compared with that of protein-coding cDNAs.

    Nature genetics 2004;36;1;40-5

  • Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs.

    Wiemann S, Weil B, Wellenreuther R, Gassenhuber J, Glassl S, Ansorge W, Böcher M, Blöcker H, Bauersachs S, Blum H, Lauber J, Düsterhöft A, Beyer A, Köhrer K, Strack N, Mewes HW, Ottenwälder B, Obermaier B, Tampe J, Heubner D, Wambutt R, Korn B, Klein M and Poustka A

    Molecular Genome Analysis, German Cancer Research Center, 69120 Heidelberg, Germany. s.wiemann@dkfz.de

    With the complete human genomic sequence being unraveled, the focus will shift to gene identification and to the functional analysis of gene products. The generation of a set of cDNAs, both sequences and physical clones, which contains the complete and noninterrupted protein coding regions of all human genes will provide the indispensable tools for the systematic and comprehensive analysis of protein function to eventually understand the molecular basis of man. Here we report the sequencing and analysis of 500 novel human cDNAs containing the complete protein coding frame. Assignment to functional categories was possible for 52% (259) of the encoded proteins, the remaining fraction having no similarities with known proteins. By aligning the cDNA sequences with the sequences of the finished chromosomes 21 and 22 we identified a number of genes that either had been completely missed in the analysis of the genomic sequences or had been wrongly predicted. Three of these genes appear to be present in several copies. We conclude that full-length cDNA sequencing continues to be crucial also for the accurate identification of genes. The set of 500 novel cDNAs, and another 1000 full-coding cDNAs of known transcripts we have identified, adds up to cDNA representations covering 2%--5 % of all human genes. We thus substantially contribute to the generation of a gene catalog, consisting of both full-coding cDNA sequences and clones, which should be made freely available and will become an invaluable tool for detailed functional studies.

    Genome research 2001;11;3;422-35

  • DNA cloning using in vitro site-specific recombination.

    Hartley JL, Temple GF and Brasch MA

    Life Technologies, Inc., Rockville, Maryland 20850, USA. jhartley@lifetech.com

    As a result of numerous genome sequencing projects, large numbers of candidate open reading frames are being identified, many of which have no known function. Analysis of these genes typically involves the transfer of DNA segments into a variety of vector backgrounds for protein expression and functional analysis. We describe a method called recombinational cloning that uses in vitro site-specific recombination to accomplish the directional cloning of PCR products and the subsequent automatic subcloning of the DNA segment into new vector backbones at high efficiency. Numerous DNA segments can be transferred in parallel into many different vector backgrounds, providing an approach to high-throughput, in-depth functional analysis of genes and rapid optimization of protein expression. The resulting subclones maintain orientation and reading frame register, allowing amino- and carboxy-terminal translation fusions to be generated. In this paper, we outline the concepts of this approach and provide several examples that highlight some of its potential.

    Genome research 2000;10;11;1788-95

  • Systematic subcellular localization of novel proteins identified by large-scale cDNA sequencing.

    Simpson JC, Wellenreuther R, Poustka A, Pepperkok R and Wiemann S

    Department of Cell Biology and Biophysics, EMBL Heidelberg, Germany.

    As a first step towards a more comprehensive functional characterization of cDNAs than bioinformatic analysis, which can only make functional predictions for about half of the cDNAs sequenced, we have developed and tested a strategy that allows their systematic and fast subcellular localization. We have used a novel cloning technology to rapidly generate N- and C-terminal green fluorescent protein fusions of cDNAs to examine the intracellular localizations of > 100 expressed fusion proteins in living cells. The entire analysis is suitable for automation, which will be important for scaling up throughput. For > 80% of these new proteins a clear intracellular localization to known structures or organelles could be determined. For the cDNAs where bioinformatic analyses were able to predict possible identities, the localization was able to support these predictions in 75% of cases. For those cDNAs where no homologies could be predicted, the localization data represent the first information.

    EMBO reports 2000;1;3;287-92

Gene lists (6)

Gene List Source Species Name Description Gene count
L00000009 G2C Homo sapiens Human PSD Human orthologues of mouse PSD adapted from Collins et al (2006) 1080
L00000016 G2C Homo sapiens Human PSP Human orthologues of mouse PSP adapted from Collins et al (2006) 1121
L00000059 G2C Homo sapiens BAYES-COLLINS-HUMAN-PSD-CONSENSUS Human cortex PSD consensus 748
L00000061 G2C Homo sapiens BAYES-COLLINS-MOUSE-PSD-CONSENSUS Mouse cortex PSD consensus (ortho) 984
L00000069 G2C Homo sapiens BAYES-COLLINS-HUMAN-PSD-FULL Human cortex biopsy PSD full list 1461
L00000071 G2C Homo sapiens BAYES-COLLINS-MOUSE-PSD-FULL Mouse cortex PSD full list (ortho) 1556
© G2C 2014. The Genes to Cognition Programme received funding from The Wellcome Trust and the EU FP7 Framework Programmes:
EUROSPIN (FP7-HEALTH-241498), SynSys (FP7-HEALTH-242167) and GENCODYS (FP7-HEALTH-241995).

Cookies Policy | Terms and Conditions. This site is hosted by Edinburgh University and the Genes to Cognition Programme.