Over-representation of Chi sequences caused by di-codon increase in Escherichia coli K-12

Reina Uno, Yoichi Nakayama, Masaru Tomita

Chi sequences (5′-GCTGGTGG-3′) are cis-acting 8 bp sequence elements that enhance homologous recombination promoted by the RecBCD pathway in Escherichia coli. The genome of E. coli K-12 MG1655 contains 1009 Chi sequences and this frequency far exceeds the expected value for occurrence of an 8 bp sequence in a genome of this size. It is generally thought that the over-representation of Chi sequences indicates that they have been selected for during evolution because of their function in recombination. The genes from three E. coli strains (K-12, O157 and CFT) were classified into three categories (island, match to other E. coli, and backbone). Island genes have a different base composition and codon usage in comparison with those in the backbone genes, therefore they were relatively new and not yet adapted to the base composition patterns and codon usage typical of the recipient genome. The over-representation of Chi sequences was examined by comparing Chi frequencies and codon frequencies between island and backbone genes. The difference in the CTGGTG di-codon frequency between the backbone and island genes was correlated with the frequency of Chi sequences which were translated in the Leu-Val (-G|CTG|GTG|G-) reading frame in the K-12 strain. These results suggest that the main reading frame of Chi sequences increased as a result of the di-codon CTG-GTG increasing under a genome-wide pressure for adapting to the codon usage and base composition of the E. coli K-12 strain, and that the RecBCD recombinase might adjust its recognition sequence to a frequently occurring oligomer such as G-CTG-GTG-G.

