In recent years, the demand for AI computing power has seen an explosive growth. With the continuous improvement of AI chip performance, the driving force behind it — the power consumption and heat generation of large-scale data centers — have also increased, bringing about increasingly severe issues of electricity consumption and heat dissipation.
Taking the United States as an example, data centers that carry computers, storage systems, and computing infrastructures account for about 2% of the total electricity consumption in the United States, and the cooling of data centers accounts for as much as 40% of the overall energy usage of data centers. In addition to the huge energy consumption, the heat generated by data centers is also very serious. Poor heat dissipation not only affects the stable operation and service life of the equipment but also limits the further improvement of computing power.
The traditional air cooling heat dissipation scheme can no longer meet the heat dissipation needs of future large-scale data centers. In order to develop new heat dissipation technologies for future data centers, the Advanced Research Projects Agency - Energy (ARPA-E) of the U.S. Department of Energy launched the "COOLERCHIPS" program in 2022, aiming to optimize the energy, reliability, and ultra-high carbon efficiency of data center information processing systems. The program invested $40 million to promote advanced research projects in data center cooling systems, with the goal of significantly reducing the cooling energy consumption of data center IT equipment workloads from the current 30%-40% to 5% of the total energy consumption of data centers.
Advertisement
ARPA-E selected 15 units including companies (Intel, NVIDIA, Raytheon Technologies Research Center, and HP) and universities to jointly launch the COOLERCHIPS program. As one of the 15 selected projects, the research team led by Professor Wei Tiwei from the School of Mechanical Engineering at Purdue University will develop an innovative "chip-level direct two-phase impingement jet cooling" scheme, which can greatly improve the overall thermal performance of data centers while reducing the fluid transport power of the pump system, providing a new strategy for data center heat dissipation.
The design includes new algorithms for the optimization of cooling structure topology, a new chip-level direct printing method for laser powder bed fusion direct printing of porous wetting layers, and a multi-input/multi-output fluid distribution integrated pipeline for additive manufacturing.Recently, "Ask the Core" interviewed Professor Wei Tiwei from Purdue University. In the interview, he shared and interpreted the principles of chip-level two-phase impinging jet liquid cooling technology, the development of chip-level packaging heat dissipation technology, future development directions, and chip three-dimensional integrated packaging, among other aspects.
Dr. Wei Tiwei graduated from the European Microelectronics Research Center (imec), and his research direction mainly revolves around chip-level three-dimensional system integration and heat dissipation technology. Subsequently, he joined the Nanoscale Heat Transfer Research Group in the Department of Mechanical Engineering at Stanford University for postdoctoral research. In 2022, Wei Tiwei officially joined Purdue University in the United States and served as an assistant professor in the Department of Mechanical Engineering. His laboratory (Semiconductor Packaging Laboratory: ) is currently researching chip-level three-dimensional system integration technology, semiconductor interconnection and packaging technology, and chip-level heat dissipation technology, among other fields.
The two-phase impinging jet liquid cooling technology can reduce the chip thermal resistance by two orders of magnitude.
"Data centers, which carry computers, storage systems, and computing infrastructure, account for about 2% of the total power consumption in the United States. The power consumption of data centers is largely due to the low heat dissipation efficiency of the data centers, so solving the heat dissipation problem can bring better energy-saving effects and reduce power consumption. This is also the reason why the energy department launched the project tender." Wei Tiwei explained.
Traditional cooling technology often applies a layer of thermal interface material conductive adhesive on the surface of the chip packaging after it is completed, and then cools and reduces the temperature through external air cooling or water cooling radiators. However, the thermal resistance of this cooling method is limited by the low thermal conductivity of the thermal interface material between the chip surface and the radiator, which is also the main limitation of the current cooling technology.The solution we propose is specifically for the next generation of data center cooling with 'chip-level two-phase impinging jet direct cooling' technology, which has already received a $2 million grant from the U.S. Department of Energy. He stated, "Of course, the requirements set by ARPA-E are also very strict, one of the core indicators is the need to achieve a certain ultra-low chip thermal resistance and system fluid delivery power consumption."
Thermal resistance refers to the ratio between the temperature difference across an object and the power of the heat source when heat is transferred through the object. This is one of the important indicators to evaluate the performance of a cooling technology.
How to reduce thermal resistance is currently the most challenging core issue in the field of chip cooling technology. "At present, the thermal resistance of traditional chip cooling technology can reach as low as about 0.3 K/W, while the chip cooling thermal resistance value using two-phase jet impinging cooling technology can be reduced to 0.0035 K/W, which is two orders of magnitude lower. Such cooling effect allows the chip temperature to be reduced to a very low level, and the cooling efficiency is 50 to 100 times higher compared to traditional cooling technology," he pointed out.
In terms of technical principles, the "two-phase impinging jet cooling" technology involves constructing microchannels filled with liquid directly inside the microchip package. When the chip generates heat, the liquid is heated to boiling, and the produced steam carries away the heat. Subsequently, the steam condenses and recirculates, starting the cooling process again.
"It should be noted that the cooling technology we have developed is not just a simple hole punching, but includes a multi-layer micro-nano processing design of tiny structures, forming a very complex multi-layer gas-liquid transport distribution system. Such a design not only has high cooling efficiency but also reduces the resistance of liquid flow. In fact, this is a very complex interdisciplinary engineering project, involving the collaborative design of chips, electricity, heat, and mechanical structures," Wei Tiwei pointed out.Under normal circumstances, the CPU's encapsulation is covered by a metal lid, which is coated with thermal interface material and then connected to a heat sink. Thermal interface material is also filled between the metal lid and the chip. However, due to the multi-layer thermal interface material and complex thermal interface contact, the overall thermal resistance of the chip is very high, and the heat dissipation effect cannot meet the heat dissipation requirements of future high power density data centers.
"The closer the liquid cooling scheme is to the chip, the lower the overall thermal resistance from the chip junction temperature to the fluid, and the higher the heat dissipation efficiency will be," Wei Tiwei pointed out, "Our cooling scheme directly skips two layers of thermal interface materials, exposing the entire back of the chip, allowing the liquid jet to directly impact the back of the chip, truly achieving chip-level encapsulation cooling heat dissipation. At the same time, through the optimization of the system flow resistance design, we have also reduced the energy consumption of the cooling system. In other words, we let the coolant flow directly inside the chip encapsulation for heat dissipation." He said.
"In addition to this, the uniqueness of this research project lies in the cross-scale and multi-level cooling optimization. It is not only necessary to focus on the heat dissipation design at the semiconductor micro-chip and chip encapsulation levels, but also to consider the heat dissipation components, racks, system levels, and the layout of the data center itself. From micro to macro, all these aspects need to be closely connected to jointly achieve efficient cooling and energy saving." He pointed out.
"Previously, we had developed a single-phase impact jet cooling technology, which removed all two layers of thermal interface materials between the chip encapsulation and the radiator, achieving more efficient chip-level direct cooling heat dissipation. In fact, the 'single-phase impact jet' technology we originally developed has achieved good cooling effects. This technology can achieve a cooling capacity of 350 W per square centimeter of the chip, or about 3.5 W per square millimeter, which is 3.5 times higher than common coolers." Wei Tiwei said, "But considering the higher cooling requirements of future data centers, we have developed this 'two-phase flow' technology to further improve the cooling efficiency, with the goal of achieving a cooling capacity of 500 W to 800 W per square centimeter of the chip."
"At this stage, in addition to the 'chip-level two-phase impact jet cooling' technology, we are also simultaneously advancing the research and development of several chip cooling technologies. Among them, we are developing an anisotropic thermal interface material with ultra-high thermal conductivity. In simple terms, it is to integrate our new thermal interface material on the metal lid of the chip's outer packaging, and by combining it with an efficient liquid cooling plate, better cooling effects can also be achieved. This design can eliminate the reliability risks that may be brought by the direct contact of the cooling liquid with the back of the chip silicon." He introduced."At the same time, our team is currently in talks with companies such as Intel and Meta to discuss and explore a more flexible, detachable packaging-level liquid cooling integration solution," he said.
"The cooling technology for future data centers will be packaging-level, chip-level liquid cooling."
When discussing the ideal cooling solution for large data centers in the future, Wei Tiwei believes that liquid cooling will definitely be the future trend. "Specifically, it is necessary to distinguish between different time periods. In the short term, for example, in the next 3-5 years, 'high thermal conductivity thermal interface materials + high-performance heat sinks' can meet certain cooling requirements, but this cooling method still requires a layer of thermal interface material," he said.
"In the medium to long term, for example, after 5 or even 10 years, cooling technology will no longer rely on thermal interface materials. This is also the problem that the COOLERCHIPS project plans to solve, that is, to develop new cooling technology for the next generation of data centers. Therefore, the future will definitely be directed towards packaging-level, chip-level direct liquid cooling technology," said Wei Tiwei.Unlike common cooling technologies on the market, such as the heat pipe cooling technology commonly used in graphics cards, and the Vapor Chamber cooling technology used in smartphones, "the single-phase and two-phase liquid cooling technologies we are developing now belong to chip-level cooling technologies. These technologies are aimed at data centers, communication base stations, and the automotive industry (such as cooling for intelligent driving chips), and are currently not suitable for the consumer electronics and micro-device fields. The main reason is the consideration of volume issues, active cooling requires a water pump, and micro-electron devices such as mobile phones are difficult to integrate this, relying more on passive cooling technologies," he said.
"At present, including heat pipes, Vapor Chamber, and other products are usually produced in the factory, and then integrated on the chip with thermal conductive glue. In general, these devices all belong to the category of packaging external components for cooling, and what our team has developed is to package the cooling inside the chip, integrated into the chip," Wei Tiwei pointed out, "The development of chip-level cooling technology involves micro-nano processing, chip packaging integration and other cutting-edge technologies, there are many technical barriers, but it can bring better cooling effects."
"In addition to developing chip-level cooling technology, our research group is also conducting research around the three-dimensional system integration and packaging technology of the chip," he introduced.
With the increasing demand for miniaturization of electronic devices and multi-functional system integration, the three-dimensional integrated packaging technology of chips shows a broad prospect and is becoming more and more important. "Three-dimensional integration improves the integration by stacking multiple layers of chips in the vertical direction, which can reduce the length of metal interconnects and reduce interconnect delay, and is the main driving force for the development of semiconductor process technology in today's 'post-Moore era'," Wei Tiwei said.
In his view, the key interconnect technology of chip three-dimensional integrated packaging has two points, the first is the through-silicon via technology, the second is the chip bonding technology. "Our laboratory is currently conducting research around these two core technologies, specifically, we are focusing on sub-micrometer through-silicon vias and chip bonding technologies," he said, "At present, TSMC's through-silicon via technology has a via diameter of about 10 microns, which has been commercialized, and what we are developing is 500 nanometers, smaller in size, can be integrated more, but the difficulty is greater."It is worth mentioning that not long ago, a new copper micro-via assisted bonding method for fine-pitch copper/tin micro-solder ball three-dimensional interconnection, developed by Wei Tiwei's team, won an award at the 2024 International Electronics Packaging Conference.
According to reports, in November this year, Purdue University will hold a Reliability of Electronics and Photonics Packaging (REPP) symposium, with Wei Tiwei serving as the general chair of the symposium. "Reliability is often overlooked in semiconductor packaging design and manufacturing. We hope that this symposium will bring together electrical, material, mechanical, and computer engineers and scientists to explore the latest technologies in the field of electronic and photonic packaging," he said.
In terms of industrialization, focusing on chip packaging, Wei Tiwei has already obtained more than 10 technical patents in China; in the United States, he has obtained 6 technical patent authorizations in the field of chip-level packaging and heat dissipation.
"We have received funding from ARPA-E for the chip-level heat dissipation technology project, and another important mission of ARPA-E is to fund young scholars and encourage them to bring the developed advanced technologies to the market. Therefore, based on these patents, I plan to establish a company in the near future to develop chip-level heat dissipation technology and new types of high thermal conductivity thermal interface materials for packaging inside the chip," said Wei Tiwei.