一个生信挖掘的DEMO

一个生信挖掘的DEMO

  起源是海洋师兄让我复现一个公司做的生信分析,据说这样一个东西收费1w+,于是我开始尝试进行探索。

分析GEO数据miRNA表达情况

  GSE106452是4种肝癌细胞的外泌体miRNA的微阵列分析数据,我们由于只有HepG2细胞,因此只拿该细胞数据,挑选表达较高的30个miRNA数据。甚至都用不到Python和R,Excel排序即可。

点评:开始俺还以为得用GEO2R做差异分析,事实上是俺想复杂了。

TCGA差异分析

TCGA数据下载

选择数据

  访问GDC,在Cases里的Project选择TCGA-LIHC,Files里的Data Category选择transcriptome profiling,如果需要下载临床资料,类似地,Data Category选择Clinical即可。

原始方法

  如果大小不超过50M,可以点击右边的Add All Files to Cart,然后再右上角的Cart里面选择Download即可下载。

使用GDC Data Transfer Tool

  下载GDC Data Transfer Tool,并配置环境变量。

  在之前的页面Data Category选择transcriptome profiling,Experimental Strategy选择RNA-Seq,然后在右边选择下载Manifest,文件命名为gdc_manifest.2020-01-17-LIHC-RNA-Seq.txt,此外Experimental Strategy选择miRNA-Seq,Data Type分别选择miRNA Expression QuantificationIsoform Expression Quantification分别得到另外两个文件gdc_manifest.2020-01-17-LIHC-miRNA-Seq.txtgdc_manifest.2020-01-17-LIHC-miRNA-Isoform.txt。类似地,Data Category选择Clinical,Data Format选择bcr xml即可获取gdc_manifest.2020-01-17-LIHC-Clinical.txt

  新建一个LIHC文件夹,将上述Manifest拷进去,CMD下输入gdc-client download -m gdc_manifest.2020-01-17-LIHC-Clinical.txt -d Clinical/ --log-file Clinical.log即可下载临床数据,其他的操作同理。不过无力吐槽这个网络可连接性,下载起来比登天还难。

这里我采用了分割文件&挂代理的策略才把RNA数据下完。

console下设置代理:

set http_proxy=http://127.0.0.1:10809。

set https_proxy=https://127.0.0.1:10809。

  此外,在下载上述manifest文件以外顺便应该下载json文件备用。

数据处理

Clinical数据合并

  主要是xml文件,利用R包来分析。以单个文件为例:

1
2
3
4
5
6
7
library("XML")
library("methods")
result = xmlParse(file="D:/Bioinformatics/TCGA/LIHC/Clinical/00a9a7f4-06eb-40fb-8d3a-66f5f5d315f7/nationwidechildrens.org_clinical.TCGA-KR-A7K0.xml")
rootnode = xmlRoot(result)
rootsize = xmlSize(rootnode)
xmldataframe = xmlToDataFrame(rootnode[2])
write.table(t(xmldataframe),'tmp')

  当然具体数据有成百上千个,应该用循环来做。

1
2
3
4
5
6
7
8
9
10
11
library("XML")
library("methods")
dir = "D:/Bioinformatics/TCGA/LIHC/Clinical/"
cl = lapply(list.files(path=dir,pattern="*.xml$",recursive=T),function(x){
result = xmlParse(file=file.path(dir,x))
rootnode = xmlRoot(result)
xmldataframe = xmlToDataFrame(rootnode[2])
return(t(xmldataframe))
})
cl_df = t(do.call(cbind,cl))
save(cl_df,file="D:/Bioinformatics/TCGA/LIHC/Process/GDC_TCGA_LIHC_clinical_df.Rdata")

miRNA-Seq数据合并

  类似地,修改处理代码如下:

1
2
3
4
5
6
7
8
9
dir = "D:/Bioinformatics/TCGA/LIHC/miRNA-Seq/"
mi = lapply(list.files(path=dir,pattern="*.mirnas.quantification.txt$",recursive=T),function(x){
result = read.table(file=file.path(dir,x),sep="\t",header=T)[,1:2]
return(result)
})
mi_df = t(do.call(cbind,mi))
colnames(mi_df) = mi_df[1,]
mi_df = mi_df[seq(2,nrow(mi_df),by=2),]
save(mi_df,file="D:/Bioinformatics/TCGA/LIHC/Process/GDC_TCGA_LIHC_miRNA-Seq_df.Rdata")

  这样可以得到一个表达矩阵。

Json数据解析

  类似地,修改处理代码如下:

1
2
3
4
5
6
7
8
9
library("rjson")
result = fromJSON(file = "D:/Bioinformatics/TCGA/LIHC/TCGA-LIHC-miRNA-Seq.json")
fls = unlist(lapply(result,function(x){x$file_name}))
cid = unlist(lapply(result,function(x){x$cases[[1]]$case_id}))
id2fls = data.frame(cid=cid,fls=fls)
save(id2fls,mi_df,fls,cl_df,file='D:/Bioinformatics/TCGA/LIHC/Rdata/GDC_TCGA_LIHC_miRNA-clinical.Rdata')
# 相关数据的清除与读取
rm(list=ls())
load(file='D:/Bioinformatics/TCGA/LIHC/Rdata/GDC_TCGA_LIHC_miRNA-clinical.Rdata')

使用TCGAbiolinks包

  感觉真的很方便,在BiocManager里面安装:

1
2
3
4
5
6
7
8
9
10
11
12
# 安装
BiocManager::install("TCGAbiolinks")

# 下载
library(TCGAbiolinks)
query = GDCquery(project = 'TCGA-LIHC',
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts")
GDCdownload(query, method = "api", files.per.chunk = 100)
expdat = GDCprepare(query = query)
count_matrix = assay(expdat)

PS1:本来打算用RTCGA的,但是这个包N年不更新了,用着膈应~

PS2:R语言设置代理:

Sys.setenv(“http_proxy”=”http://127.0.0.1:10809")
Sys.setenv(“https_proxy”=”https://127.0.0.1:10809")

##

评论

3D cell culture 3D cell culturing 3D cell microarrays 3D culture 3D culture model 3D printing 3D spheroid 3D tumor culture 3D tumors 3D vascular mapping ACT ADV AUTODESK Abdominal wall defects Acoustofluidics Adipocyte Adipogenesis Adoptive cell therapy AirPods Alginate Anticancer Anticancer agents Anticancer drugs Apple Apriori Association Analysis August AutoCAD Autodock Vina Bio-inspired systems Biochannels Bioengineering Bioinspired Biological physics Biomarkers Biomaterial Biomaterials Biomimetic materials Biomimetics Bioprinting Blood purification Blood-brain barrier Bone regeneration Breast cancer Breast cancer cells Breast neoplasms CM1860 CRISPR/Cas9 system CSS CTC isolation CTCs Cancer Cancer angiogenesis Cancer cell invasion Cancer immunity Cancer immunotherapy Cancer metabolism Cancer metastasis Cancer models Cancer screening Cancer stem cells Cell adhesion Cell arrays Cell assembly Cell clusters Cell culture Cell culture techniques Cell mechanical stimulation Cell morphology Cell trapping Cell-bead pairing Cell-cell interaction Cell-laden gelatin methacrylate Cellular uptake Cell−cell interaction Cervical cancer Cheminformatics Chemotherapy Chimeric antigen receptor-T cells Chip interface Circulating tumor cells Clinical diagnostics Cmder Co-culture Coculture Colon Colorectal cancer Combinatorial drug screening Combinatorial drug testing Compartmentalized devices Confined migration Continuous flow Convolutional neural network Cooking Crawler Cryostat Curved geometry Cytokine detection Cytometry Cytotoxicity Cytotoxicity assay DESeq DNA tensioners Data Mining Deep learning Deformability Delaunay triangulation Detective story Diabetic wound healing Diagnostics Dielectrophoresis Differentiation Digital microfluidics Direct reprogramming Discrimination of heterogenic CTCs Django Double emulsion microfluidics Droplet Droplet microfluidics Droplets generation Droplet‐based microfluidics Drug combination Drug efficacy evaluation Drug evaluation Drug metabolism Drug resistance Drug resistance screening Drug screening Drug testing Dual isolation and profiling Dynamic culture Earphone Efficiency Efficiency of encapsulation Elastomers Embedded 3D bioprinting Encapsulation Endothelial cell Endothelial cells English Environmental hazard assessment Epithelial–mesenchymal transition Euclidean distance Exosome biogenesis Exosomes Experiment Extracellular vesicles FC40 FP-growth Fabrication Fast prototyping Fibroblasts Fibrous strands Fiddler Flask Flow rates Fluorescence‐activated cell sorting Functional drug testing GEO Galgame Game Gene Expression Profiling Gene delivery Gene expression profiling Gene targetic Genetic association Gene‐editing Gigabyte Glypican-1 GoldenDict Google Translate Gradient generator Growth factor G‐CSF HBEXO-Chip HTML Hanging drop Head and neck cancer Hectorite nanoclay Hepatic models Hepatocytes Heterotypic tumor HiPSCs High throughput analyses High-throughput High-throughput drug screening High-throughput screening assays High‐throughput methods Histopathology Human neural stem cells Human skin equivalent Hydrogel Hydrogel hypoxia Hydrogels ImageJ Immune checkpoint blockade Immune-cell infiltration Immunoassay Immunological surveillance Immunotherapy In vitro tests In vivo mimicking Induced hepatocytes Innervation Insulin resistance Insulin signaling Interferon‐gamma Intestinal stem cells Intracellular delivery Intratumoral heterogeneity JRPG Jaccard coefficient JavaScript July June KNN Kidney-on-a-chip Lab-on-a-chip Laptop Large scale Lattice resoning Leica Leukapheresis Link Lipid metabolism Liquid biopsy Literature Liver Liver microenvironment Liver spheroid Luminal mechanics Lung cells MOE Machine Learning Machine learning Macro Macromolecule delivery Macroporous microgel scaffolds Magnetic field Magnetic sorting Malignant potential Mammary tumor organoids Manhattan distance Manual Materials science May Mechanical forces Melanoma Mesenchymal stem cells Mesoporous silica particles (MSNs) Metastasis Microassembly Microcapsule Microcontact printing Microdroplets Microenvironment Microfluidic array Microfluidic chips Microfluidic device Microfluidic droplet Microfluidic organ-on-a chip Microfluidic organ-on-a-chip Microfluidic patterning Microfluidic screening Microfluidic tumor models Microfluidic-blow-spinning Microfluidics Microneedles Micropatterning Microtexture Microvascular Microvascular networks Microvasculatures Microwells Mini-guts Mirco-droplets Molecular docking Molecular imprinting Monolith Monthly Multi-Size 3D tumors Multi-organoid-on-chip Multicellular spheroids Multicellular systems Multicellular tumor aggregates Multi‐step cascade reactions Myeloid-derived suppressor cells NK cell NanoZoomer Nanomaterials Nanoparticle delivery Nanoparticle drug delivery Nanoparticles Nanowell Natural killer cells Neural progenitor cell Neuroblastoma Neuronal cell Neurons Nintendo Nissl body Node.js On-Chip orthogonal Analysis OpenBabel Organ-on-a-chip Organ-on-a-chip devices Organically modified ceramics Organoids Organ‐on‐a‐chip Osteochondral interface Oxygen control Oxygen gradients Oxygen microenvironments PDA-modified lung scaffolds PDMS PTX‐loaded liposomes Pain relief Pancreatic cancer Pancreatic ductal adenocarcinoma Pancreatic islet Pathology Patient-derived organoid Patient-derived tumor model Patterning Pearl powder Pearson coefficient Penetralium Perfusable Personalized medicine Photocytotoxicity Photodynamic therapy (PDT) Physiological geometry Pluronic F127 Pneumatic valve Poetry Polymer giant unilamellar vesicles Polystyrene PowerShell Precision medicine Preclinical models Premetastatic niche Primary cell transfection Printing Protein patterning Protein secretion Pubmed PyMOL Pybel Pytesseract Python Quasi-static hydrodynamic capture R RDKit RNAi nanomedicine RPG Reactive oxygen species Reagents preparation Resistance Review Rod-shaped microgels STRING Selective isolation Self-assembly Self-healing hydrogel September Signal transduction Silk-collagen biomaterial composite Similarity Single cell Single cells Single molecule Single-cell Single-cell RNA sequencing Single‐cell analysis Single‐cell printing Size exclusion Skin regeneration Soft lithography Softstar Spheroids Spheroids-on-chips Staining StarBase Stem cells Sub-Poisson distribution Supramolecular chemistry Surface chemistry Surface modification Switch T cell function TCGA Tanimoto coefficient The Millennium Destiny The Wind Road Thin gel Tissue engineering Transcriptome Transfection Transient receptor potential channel modulators Tropism Tubulogenesis Tumor environmental Tumor exosomes Tumor growth and invasion Tumor immunotherapy Tumor metastasis Tumor microenvironment Tumor response Tumor sizes Tumor spheroid Tumor-on-a-chip Tumorsphere Tumors‐on‐a‐chip Type 2 diabetes mellitus Ultrasensitive detection Unboxing Underlying mechanism Vascularization Vascularized model Vasculature Visual novel Wettability Windows Terminal Word Writing Wuxia Xenoblade Chronicles Xin dynasty XuanYuan Sword Youdao cnpm fsevents miR-125b-5p miR-214-3p miRNA signature miRanda npm
Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×