Tilera – The Many Core Solution

Brilliantly Deliver And Enablehigh-Performance, Energy-Efficiencyand Flexibilityin Network Security, Multimediastreaming And Cloud Computing Applications.


As data and services migrate from traditional applications to the cloud and mobile applications, it require more bandwidth.Designers of equipment for next-generation network infrastructure are required to achieve increased network performance despite demanding power consumption constraints. The computing demand of intelligent networking and security applications are also increasing dramatically, driven by increasing bandwidth requirements, services from the periphery of enterprises into the core of the network, and increasing sophistication and deployment of security and services functions.To meet these challenges, networking equipment must provide best-in-class energy efficiency (performance per Watt) using a hardware and software architecture that can scale to support growth in user traffic. Currently, the performance delivered by traditional processors and dedicated processors has not kept up with the increasing computing demand.

Interconnection networks can be classified into shared-medium networks or switched-medium networks. A shared-medium network transfers data on a network medium (i.e., link) shared by allconnected nodes. Shared bus network falls into this category. On the otherhand, a switch-medium network consists of switch fabrics (routers) and point-to-point links. A routerhas several input ports and output ports, and it dynamically establishes a connection between a setof an input port and an output port. A switch-medium network can be formed by connecting routersusing point-to-point links.A typical example of on-chip shared-medium network is an on-chip bus that connects IP coresvia a shared wire link on a single chip. On-chip buses have been widely used as traditional on-chip interconnects. Benefits of buses are efficiency in transferring short packets such as signals,and various techniques to improve the performance of on-chip buses have been studied.Contrary, buses are considered to be inadequate to handle large communication bandwidth demands when they connect a large number of IP cores, and also inefficient in terms of area.

CASwell TILE-Gxfamily products featurewith 9, 16,36, and 72 identical processor cores (tiles) interconnected with Tilera’siMesh on-chip network. Each core consists of a full-featured 64-bit processor core as well as L1 and L2 cache and a non-blocking mesh which connects the core into the Tilera Intelligent mesh (iMesh). Up to 23 MBytes of coherent cache is available and the high-end TILE-Gx devices can address up to 1 TB of DDR3 memory. The highly-scalable Tile architecture provides a broad range of performance and price-points to meet the customers’ requirements – all with an open source and easy to program software environment. All TILE-Gx products are software compatible, customers can easily integrate their application software and scale corresponding performance levels by leveraging these processing cores.

High Performance PCI Express Technology

The bandwidth for Ethernet is the pervasive communications technology having migrated from 1Gbps, 10Gbps to 40Gbps and now moving to 100Gbps. Modern servers and processors are spending more time processing packets flowing from Ethernet ports to the PCI interface than any other task. The Network Interface Card (NIC) is the logical bridge from the network world (Ethernet) to compute (PCIe). While modern server design including processing, memory and I/O are interconnected with data, address and control buses. Current computing architecture is designed for local memory workloads in which the processor mostly crunches algorithms, while the I/O throughput is secondary and dimensioned for peripherals. The PCIe interface, quickly established itself as the lead I/O interface, creating massive industry backing for it.

Power-Efficient Design

CASwell intelligent NIP module has faster and low-power processors to handle all the vast amounts of data. The massive compute of the TileraTILE-Gxfamily processor is complimented with 40 Gbps of EthernetI/O to enable both of network endpoint or “bump in the wire”configurations.Such powerful and high-efficient processor consumes only up to 25W. Hence it has the optimized PUE (Power Usage Efficiency) and is the best choice for network and cloud applications.

Flexible Modular Design

CASwell’s flexible modularitydesign can helpcustomers easily deploy their products development and integration.As shown in Figure 1, modular design offers several different configurations of network interfaces, which providesthe benefits of scalability and flexibility. All of these parts can be easily interchangeable and replaceable depend on customers’ requirement.

The conceptual framework shows of heterogeneous communication model between X86 host and Tilera compute module in which X86 host platform can be seamlessly integrate with Tilera compute module. As can be seen, packets received from the network link are forwarded to the x86 host. The X86 hostcan take corresponding process and loopback packets to Tilera module over the PCIe communication channel by leveraging packet queue and DMA technologies.Depending customers’ host applications, appropriate driver implementation can be chosen tomaximize performance. CASwell TILE-Gx products enable true application offloading capability by the unique combination of high throughput compute, low power consumption, and a standard C/Linux based programming model. CASwellTILE-Gxbased products satisfy all of our customers’ differing needs with acomplete and scalable solution.

CASwell Tilera series products deliver flexible software infrastructure that allows customers to connect different applications together to form the needed applications. The following applications are available and can be running seamlessly and scaling across different processor cores:

– Software-Defined NIC – Complete infrastructure framework for a highly agile and programmable N x 10G NIC function, with extensions for 40G and 100G, supporting Intel DPDK drivers, SR-IOV and virtualized servers
– Open vSwitch – L4 open source user-space implementation at speeds up to 40Gbps, tightly coupled with SR-IOV, designed to support SDN and NFV server adapters
– Deep Packet Inspection (DPI) – Real-time, Layer 2-7 classification of network application traffic at up to 50Gbps on a single TILE-Gx processor
– Security Protocol Offload – Complete IPsec and SSL datapath and handshake offload at 40Gbps
– Intrusion Detection/Prevention – Highest performance Suricata multi-threaded IDS/IPS, fully integrated on TILE-Gx with support for the ‘Emerging Threats’ rule base
– Network Analytics – Wire-speed capture of packets to host processor or to disc with precision time-stamping, and with optional programmable flow filtering at up to 80Gbps
– TCP/IP Stack – High performance user-space TCP/IP implementation scales linearly with number of cores. 80Gbps throughput and 1.3M connections/second using a fraction of cores

Recommended Products