Tag Archives: processor

Overview Of New Intel Core i7(Nehalem) Processor – Part 2

Before entering on the details about the architectural features present on the Nehalem CPUs, let’s make a summary of the base elements that are common to the many different versions: server, desktop and notebook. It’s worth noticing how the new architecture engineering process used by Intel aims at obtaining CPUs that can be used in all three sectors, by simply slightly changing architecture and CPU characteristics to better adapt to each of the sectors.

To make things clear, we can mention a few examples: for the notebook CPUs, there’s a lower energy consumptions, while for the server solutions, there could be bigger cache numbers. In general, analyzing the technical characteristics from the Nehalem CPUs when compared to the Core 2 family, it’s clear how the developing team aimedf at implementing features that would bring the best performance benefits on a server level, with an approach that is somewhat similar to wht AMD has done in the past with their first K8 family CPUs, Opteron and Athlon 64.

These are the base elements on the Nehalem family CPU’s.

- Native Quad Core architecture: Intel has abandoned the Multi Chip Package approach, choosing instead what is defined as “monolithic design” for the Nehalem CPU’s. The four cores, similar to the Phenom solutions from AMD, are integrated in the same silicon block instead of pairing two dual-core dice in the same package.

- DDR3 Memory controller integrated on the processor: it’s a new feature for the Intel processors, the integrated memory controller has been on the market for a while now with the AMD CPus since the K8 family, with the first Opteron CPU presented on April 2003.

- Integrated on-die L3 Cache in all processors, up to 8MB; and also, the size of the L2 cache, specific to each core, was noticeably reduced compared to what has been done in the previous Core 2 processors. In future versions, the Nehalem processors will feature differen L3 cache versions, according to the market sector where it belongs;

- Return of the Simultaneous Multi-Threading technology, better known with the market name “Hyper-Threading”, thanks to which the Operational System can recognize the processor as featuring a double number of logical cores than what’s physically integrated. This technology was introduced by Intel with some Pentium 4 models, but it wasn’t implemented in the Core 2 Duo and Core 2 Quad solutions;

- A new set of SSE 4.2 instructions, which are extensions of SSE4 instructions introduced for the first time with the Core 2 CPUs based on Penryn cores;

- QPI (Quick Path Interconnect) technology debut: it comes to replace the front side bus on the connection between Processor, memory modules and in some CPU models, also chipset. For the first Core i7 family models, based on LGA 1366 socket, the connection between the processor and chipset will be done through a QPI link.

The first Nehalem processor versions with quad core architecture, the Core i7 family solutions, will integrate 731 million transistors, built using a 45nm fabrication process. The following evolution of Nehalem processors will feature the same fabrication process and will have a modular architecture that was implemented on the Nehalem project while in its design stages.

These processors can, in fact, be easily modified in order to implement a different number of cores, or integrate different types of features internally, in comparison to what had been presented in the first versions before the launch.

Two examples can make this flexibility more clear: the first is the Nehalem-EX CPUs, solutions that feature eight physical cores to be used specifically in server systems, which will be launched sometime this year, and that have been first announced back in the IDF Fall 2008. The second is the integration in future Nehalem versions aimed for a low-entry market, of a GPU: with this product, Intel aims at presenting its own alternative to AMD’s Fusion Family CPU’s, which feature both CPU and GPU components.

Intel’s forthcoming many core processor codenamed ‘Larrabee’

Intel Corporation is presenting a paper at the SIGGRAPH 2008 industry conference in Los Angeles on Aug. 12 that describes features and capabilities of its first-ever forthcoming “many-core” blueprint or architecture codenamed “Larrabee.”

Details unveiled in the SIGGRAPH paper include a new approach to the software rendering 3-D pipeline, a many-core (many processor engines in a product) programming model and performance analysis for several applications.

http://www.pcper.com/images/reviews/453/slides01.jpg

Larrabee

The first product based on Larrabee will target the personal computer graphics market and is expected in 2009 or 2010. Larrabee will be the industry’s first many-core x86 Intel architecture, meaning it will be based on an array of many processors. The individual processors are similar to the Intel processors that power the Internet and the laptops, PCs and servers that access and network to it.

Larrabee is expected to kick start an industry-wide effort to create and optimize software for the dozens, hundreds and thousands of cores expected to power future computers. Intel has a number of internal teams, projects and software-related efforts underway to speed the transition, but the tera-scale research program has been the single largest investment in Intel’s technology research and has partnered with more than 400 universities, DARPA and companies such as Microsoft and HP to move the industry in this direction.

Over time, the consistency of Intel architecture and thus developer freedom afforded by the Larrabee architecture will bring about massive innovation in many areas and market segments. For example, while current games keep getting more and more realistic, they do so within a rigid and limited framework. Working directly with some of the world’s top 3-D graphics experts, Larrabee will give developers of games and APIs (Application Programming Interface) a blank canvas onto which they can innovate like never before.

Initial product implementations of the Larrabee architecture will target discrete graphics applications, support DirectX and OpenGL, and run existing games and programs. Additionally, a broad potential range of highly parallel applications including scientific and engineering software will benefit from the Larrabee native C/C++ programming model.

Additional details of the Larrabee architecture discussed in this paper include:

  • The Larrabee architecture has a pipeline derived from the dual-issue Intel Pentium® processor, which uses a short execution pipeline with a fully coherent cache structure. The Larrabee architecture provides significant modern enhancements such as a wide vector processing unit (VPU), multi-threading, 64-bit extensions and sophisticated pre-fetching. This will enable a massive increase in available computational power combined with the familiarity and ease of programming of the Intel architecture.
  • Larrabee also includes a select few fixed function logic blocks to support graphics and other applications. These units are carefully chosen to balance strong performance per watt, yet contribute to the flexibility and programmability of the architecture.
  • A coherent on-die 2nd level cache allows efficient inter-processor communication and high-bandwidth local data to be access by CPU cores, making the writing of software programs simpler.
  • The Larrabee native programming model supports a variety of highly parallel applications, including those that use irregular data structures. This enables development of graphics APIs, rapid innovation of new graphics algorithms, and true general purpose computation on the graphics processor with established PC software development tools.
  • Larrabee features task scheduling which is performed entirely with software, rather than in fixed function logic. Therefore rendering pipelines and other complex software systems can adjust their resource scheduling based each workload’s unique computing demand.
  • The Larrabee architecture supports four execution threads per core with separate register sets per thread. This allows the use of a simple efficient in-order pipeline, but retains many of the latency-hiding benefits of more complex out-of-order pipelines when running highly parallel applications.
  • The Larrabee architecture uses a 1024 bits-wide, bi-directional ring network (i.e., 512 bits in each direction) to allow agents to communicate with each other in low latency manner resulting in super fast communication between cores.
  • The Larrabee architecture fully supports IEEE standards for single and double precision floating-point arithmetic. Support for these standards is a pre-requisite for many types of tasks including financial applications.

Review : Intel Core i7 Processor Extreme Edition

corei7eeProduct information

  • 3.20 GHz core speed
  • 8 processing threads with Intel® HT technology
  • 8 MB of Intel® Smart Cache
  • 3 Channels of DDR3 1066 MHz memory

The good: Fastest high-end desktop CPU; supporting motherboard supports both graphics card vendors’ multicard technologies.

The badThe bad: Requires an expensive new motherboard; chipset needs three memory sticks for maximum efficiency.

The bottom lineThe bottom line: Thanks to an expensive new motherboard requirement, Intel’s new Core i7 desktop processors will remain enthusiast and professional-level parts until more affordable complementary hardware comes out later next year. Speed never comes cheap, however, and if you’re willing to spend for it now, you’ll find yourself in possession of the fastest CPU on the market.

The Core i7 965 Extreme Edition runs at 3.20GHz and features a QPI (QuickPath Interface) throughput of 6.4GT/s, which is the key difference here. The mainstream versions of the processor include the Core i7 920 and 940, clocked at 2.66GHz and 2.93GHz, respectively. These more affordable processors feature a QPI throughput of just 4.8GT/s, so it will be interesting to discover what kind of impact this has on performance.

Intel Xeon® ‘Nehalem-EX’ Processor – Preview

Intel previewed the new Intel Xeon Processor code named ‘Nehalem-EX’. The Nehalem-EX processor will feature up to eight cores inside a single chip supporting 16 threads and 24MB of cache. Nehalem-EX will also double the memory capacity with up to 16 memory slots per processor socket, and offer four high-bandwidth QuickPath Interconnect links. Nehalem-EX will provide tremendous scalability, from large-memory two-socket systems through eight-socket systems capable of processing 128 threads simultaneously without the need for third-party chips to “glue” the platform together.

The Nehalem-EX Advantage

  • Intel Nehalem Architecture built on Intel’s unique 45nm high-k metal gate technology process
  • Up to 8 cores per processor
  • Up to 16 threads per processor with Intel® Hyper-threading
  • Scalability up to eight sockets via Quick Path Interconnects and greater with third-party node controllers
  • QuickPath Architecture with four high-bandwidth links
  • 24MB of shared cache
  • Integrated memory controllers
  • Intel Turbo Boost Technology
  • Intel scalable memory buffer and scalable memory interconnects
  • Up to 9x the memory bandwidth of previous generation
  • Support for up to 16 memory slots per processor socket
  • Advanced RAS capabilities including MCA Recovery
  • 2.3 billion transistors

AMD Technology on Ferrari Formula 1 Cars

Actually some guys don’t know how the processor inside a Formula 1 car works….Here we have a close look at the processor, tyres and steering which is used inside a Formula 1 car….

If you pay close attention to Ferrari F1 cars driven by Michael Schumacher and Felipe Massa (and, before him, Rubens Barrichello) you will notice an AMD logo on the tail. For the majority this simply means that AMD is paying to run an ad on Ferrari cars, but that isn’t the case. They are also providing the technology infrastructure for the car’s telemetry system, which collects data in real time and send to Ferrari team during the races, so they can check in real time if something is going wrong and also instruct the driver of corrections he should make in the way he is driving in order to achieve a higher performance during the race. The collected data are also collected for posterior analysis.

We were invited by AMD to check out this impressive electronics system in person by visiting Ferrari’s boxes during the preparation of the  race of the 2006 F1 World Championship . Let’s talk more about this thrilling experience.

FIA 2006 F1 Championship Paddock Badge
click to enlarge
Figure 1: Badge clearing access to the paddock area.

The paddock is the entrance area to the boxes, so the boxes have two doors, one to the paddock and another to the pit lane. Each team has two boxes, one for each driver.

Paddock
click to enlarge
Figure 2: Some guys from Honda preparing some tires.

During our tour we were guided by Dieter Gundel, head of racetrack electronics for Ferrari team, and Felipe Massa, driver of Ferrari team. Both explained in details how the electronics part of the Ferrari F1 cars works.

So, how telemetry works? Under FIA rules, it is not possible to send electronic information to the cars. So this system is a one-way system that sends data from the cars to the boxes. Then the engineers can analyze the data in real time and, like we told, see if something is wrong or tell the driver how can he improve the way he is driving. The data is also send to Ferrari’s HQ in Maranello, where a whole team is dedicated to analyze the collected data.

Each Ferrari car has from 100 to 150 sensors. The number isn’t exact because from track to track they add and remove sensors. Also, from the training sessions to the official race they can remove some sensors they found they won’t need for that particular track and thus can save some weight.

Data is sent from the car to the boxes using from 1,000 to 2,000 telemetry channels, transmitted wirelessly (obviously) using the 1.5 GHz frequency. These channels are encrypted, of course. The typical delay between the data being collected and it being received at the boxes is of 2 ms. For each race the amount of collected data is in the range of 1.5 billion of samples. Since they also collect the same amount for each training day, the total amount of collected data is in the range of the 5 billion samples.

Since data is compressed, here they don’t talk about megabytes or gigabytes, so the actual transfer rate used by the telemetry system is smaller.

Each car is independent, so since Ferrari has two cars, the number of collected data is actually two times higher.

Each car has also an on-board storage system (they didn’t disclosed if it is a hard disk drive or a flash memory) that buffers the most recent data, so if the transmission fails, the car keep retrying until the transmission is completed. So no data is lost when the car enters in a tunnel, for example: as soon as the communication is lost, the car keeps collecting data and storing on its on-board memory and as soon as it exits the tunnel all data collected during the period the car was inside the tunnel is sent at once to the boxes.

We were able to take a look on this system running, however they didn’t allow us to take pictures of their telemetry system (the also didn’t allow us to take pictures of the engines of the cars). In summary, a lot of computers with several LCD displays plotting charts and showing data, with lots of Ferrari engineers analyzing the data.

Driving a Formula 1 car nowadays is completely different from what it was years ago. If you take a look on the steering wheel shown on Figure 3 you will understand why. Pay attention at the number of buttons!

Ferrari F1 Car Steering Wheel
click to enlarge
Figure 3: Steering wheel from a Ferrari F1 car.

The steering wheel has also a LCD display where the driver can set several parameters of the car. For example, he can adjust on the fly several specs of the car, including brakes, suspensions, differential, etc. The plus and minus buttons are used to navigate in the car electronic system (and not to change gears – the gear levers are on the other side of the steering wheel).

New Intel PC Processors To be called, Core i7

Intel Core i7

Intel has announced that its latest range of desktop processors will officially be called, Intel Core processors. This new Intel processors will be based on company’s upcoming microarchitecture formerly codenamed “Nehalem”.

The first products by Intel in this new family of processors, which also includes an “Extreme Edition” version, will carry an “i7” identifier and will be formally branded as “Intel Core i7 processor.” This is the first of many new identifiers to come as different products launch over the next year.

Intel claims that products based on the new microarchitecture will deliver high performance and energy efficiency. Besides, they also feature Intel Hyper-Threading Technology, which is also known as simultaneous multi-threading, and is capable of handling eight software “threads” on four processor cores.

“The Core name is and will be our flagship PC processor brand going forward,” said Sean Maloney, Intel Corporation executive vice president and general manager, Sales and Marketing Group. “Expect Intel to focus even more marketing resources around that name and the Core i7 products starting now.”