Couldn't automatically detect the sequence identifier field in the fastq id string. #701

Sebastian-Mynott · 2019-03-06T14:33:12Z

Hi,

I'm looking at sequence data downloaded from the NCBI SRA database. When running filterAndTrim I get he following error:

Error in (function (fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0,  : 
  Couldn't automatically detect the sequence identifier field in the fastq id string.

After looking at the source code I tried inserting a dummy identifier, so instead of the identifier reading @SRR9876543.1 1/1, it would read @M012345:SRR9876543.1 1/1, but this didn't work.

Could you give me a suggestion how I can get around this?

Many thanks.

The text was updated successfully, but these errors were encountered:

benjjneb · 2019-03-06T16:20:47Z

What is the output of head -n4 mysrr_file.fastq (in the shell)?

What command did you use to convert from sra format to fastq? i.e. the fastq-dump arguments.

Sebastian-Mynott · 2019-03-06T16:48:03Z

Aha! I downloaded the files using package SRAdb getSRAfile(SRAccessions, sra_con, fileType = 'fastq' ) which gave me a list of .fastq.gz files so I didn't think I'd need fast-dump.

the output of head -n4 mysrr_file.fastq gives me this:
@SRR7758019.1 1/1 GCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCGGTTAAAAAGCTCGTAGTTGGATTTCTGCTGAGGACGACCGGTCCGCCCTCTNNNNNNNNNTNNNNCTCGGCNTTGGCATCTTCTTGGGGAACGTNANTGCACTTGACTGTGTGGTGCGGTATCCAGGACTTTTACTTTGAGGNNNNNNNNGTGNNNCAANCNGGCTTACGCCTTGAATACATTAGCATGGAATAATAAGATAGGACCTTGGTTCTATTTNNTTGGNNNNNNNNGCTGAGGTNATGATTACTAGGGATAG + CCCCCGGGGGGGGGEGGFGGGGGGGGGGGGGGFGFGGGGFGGGGGGGGGGGGFGDFFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG#########:####::DFGG#:BFGGGGGGGGGGGGGGGFGGG#:#:BFFGGGGGGFGGGGFGGGGGGGGGGGGGG7FGGGGGGGGGGFGG########56>###66=#6#6*;CFCGFGGGGGGFFGGGGGGGGDFG0776CAF7FF?7+??FGG6CC?C5D?GGGG##228*########0--1<CG4#--(4;A>4-5=FF**9*

Do I need to download the files again as SRA then convert to fastq?

benjjneb · 2019-03-06T16:56:52Z

Do I need to download the files again as SRA then convert to fastq?

I would at least try that on one file to see if that fixes this issue.

kelseysumner · 2019-09-03T13:50:11Z

Hi, I wanted to re-open this because I am having a similar issue. I'm using paired-end sequence data sequenced on Illumina MiSeq and also downloaded from the NCBI SRA database. I downloaded the files originally as SRA files and then converted them to zipped fastq files (fastq.gz) using fastq-dump with a flag to make sure each sample had separate files for the forward and reverse reads.

I'm getting the same error when I run DADA2 on these files:
Error in (function (fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0, :
Couldn't automatically detect the sequence identifier field in the fastq id string.
Calls: filterAndTrim ... mclapply -> lapply -> FUN -> .mapply ->
Execution halted

The head of my one of my fastq files I'm reading into DADA2 looks like this:
@SRR1191781.12854 12854 length=250
TTATTAATCCTATTGAACTATTTACGACATTAAACACACTGGAACATTTTTCCATTTTACAAATTTTTTTTTCAATATCATTTGCATAATCTAATTGGTCTTTAGGTTTATTAGCAGAGCCAGGTTTTATTCTAACTTGAATACCATTTCCACAAGTTACACTACATGGGGACCATTCAGTTGAAAGAGAATTTTGTATTGTCTTTAAATATTTTTCTATGTGCT
+
HHHHHHHHHHHFHHHHHHHHHHHHHGGFEFFHHHHHHHGHHHHHHHHHHHHHHHHHHHHHH5FGHHHGG>EGHHHHHHHHHHHGHBHHFHHHGDGHHHHHGGHHHHGHHFHHHGHFBFEGHFHH2BFGHGGHHHHHHHGGHHHHHHHHHHHGHHHHG1GHFHHGHHHHHEGGGGHHHHHGGHFHHBGGBCGHHHFHGGHGHFFHHHHHHGHHHHFGGGGGGFFGF

Do you know what might be going on and how I could fix this issue?

benjjneb · 2019-09-03T14:02:59Z

This error is because the original fastq id lines have been replaced by these SRA id lines, which filterAndTrim(..., matchIDs=TRUE) doesn't recognize.

Do you need to use the matchIDs=TRUE flag? If you don't, just remove it and everything should work fine.

kelseysumner · 2019-09-05T13:27:09Z

Thank you for the quick reply. It looks like that solved the issue!

d-callan · 2019-09-20T15:19:35Z

I'm having a similar problem with the SRA id lines, except i do require the matchIDs = TRUE flag. What then?

benjjneb · 2019-09-20T17:15:52Z

@d-callan Unfortunately I'm not sure if there is a solutions in that case. The original IDs are required to match the paired reads together if they are now in different orders.

d-callan · 2019-09-20T18:12:08Z

thanks anyhow. I'm not convinced they are truly ordered differently. but im finding there are definitely differing number of read counts for forward and reverse. perhaps i can put together a script quickly to remove those reads which dont have a partner before passing to dada2 and see where that gets me. was mostly just hoping i might not have to..

dbro970 · 2023-02-28T02:45:36Z

thanks anyhow. I'm not convinced they are truly ordered differently. but I'm finding there are definitely differing number of read counts for forward and reverse. perhaps I can put together a script quickly to remove those reads which don't have a partner before passing to dada2 and see where that gets me. was mostly just hoping I might not have to..

Hi apologies for resurrecting an old thread, I was just wondering if you managed to find a solution to this? as I've found myself in the same situation

wygbio · 2023-12-07T01:28:56Z

I am also meeting a similar issue with the head of fastq files. They were obtained by Illumina MiSeq, not downloaded from the NCBI SRA database. Is there something wrong with the head that can't be detected?
@HWI-D00433:728:HHHKHBCX2:2:1101:8032:2352.1:N:0--D13a_C25.

benjjneb closed this as completed Apr 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Couldn't automatically detect the sequence identifier field in the fastq id string. #701

Couldn't automatically detect the sequence identifier field in the fastq id string. #701

Sebastian-Mynott commented Mar 6, 2019

benjjneb commented Mar 6, 2019

Sebastian-Mynott commented Mar 6, 2019

benjjneb commented Mar 6, 2019

kelseysumner commented Sep 3, 2019

benjjneb commented Sep 3, 2019

kelseysumner commented Sep 5, 2019

d-callan commented Sep 20, 2019

benjjneb commented Sep 20, 2019

d-callan commented Sep 20, 2019

dbro970 commented Feb 28, 2023

wygbio commented Dec 7, 2023

Couldn't automatically detect the sequence identifier field in the fastq id string. #701

Couldn't automatically detect the sequence identifier field in the fastq id string. #701

Comments

Sebastian-Mynott commented Mar 6, 2019

benjjneb commented Mar 6, 2019

Sebastian-Mynott commented Mar 6, 2019

benjjneb commented Mar 6, 2019

kelseysumner commented Sep 3, 2019

benjjneb commented Sep 3, 2019

kelseysumner commented Sep 5, 2019

d-callan commented Sep 20, 2019

benjjneb commented Sep 20, 2019

d-callan commented Sep 20, 2019

dbro970 commented Feb 28, 2023

wygbio commented Dec 7, 2023