Bio


Dally investigates methods for applying VLSI technology to solve information processing problems. His current projects include network architecture, multicomputer architecture, media-processor architecture, and high-speed (4Gb/s) CMOS signaling. His research involves demonstrating novel concepts with working systems. Previous systems include the MARS Hardware Accelerator, the Torus Routing Chip, the J-Machine, M-Machine, and the Reliable Router. His group has pioneered techniques including fast capability-based addressing, processor coupling, virtual channel flow control, wormhole routing, link-level retry, message-driven processing, and deadlock-free routing.

Academic Appointments


Boards, Advisory Committees, Professional Organizations


  • Member, National Academy of Engineering (2013 - Present)
  • Member, American Academy of Arts and Sciences (2013 - Present)

Professional Education


  • PhD, Caltech (1986)

2015-16 Courses


Stanford Advisees


All Publications


  • Logic Simulation Algorithms for Pipelined Hardware Architectures Hardware Accelerators for Electrical CAD Agrawal, P., Dally, W. J., Tutundjian, R. edited by Ambler, T., Agrawal, P. 1988.
  • Program Chair’s Message Dally, W. J.
  • The Reconfigurable Arithmetic Processor Fiske, S., Dally, W. J.
  • Message-Driven Processor Architecture: Verson 11 Dally, W. J., Chien, A., Fiske, S., Horwat, W., Keen, J., Nuth, P.
  • Stanford University Concurrent VLSI Architecture Memo 124 Elastic Buffer Networks-on-Chip Michelogiannakis, G., Balfour, J., Dally, W. J.
  • Spills, Fills, and Kills Erez, M., Towles, B. P., Dally, W. J.
  • Conference Author/Panelist Index Dally, W. J., Aoki, N., Bai, X., Banerjee, K., Benini, L., Bergamaschi, R.
  • SSCS Members Honored as 2002 IEEE Fellows Banu, M., Burghartz, J. N., Dally, W. J., Dean, M. E., Gielen, G. G., Griffin, E. L.
  • IEEE MICRO 1998 ANNUAL INDEX, VOL. 18 Burns Dally, W. J., Adams, J., Alt, P. M., Arai, T., Arakawa, F., Avresky, D. R. ; 66: 79
  • CIMI FÍITIIt Dally, W. J., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R. C., Parikb, V.
  • AI Memo No. 1272 April 26, 1994 Spertus, E., Dally, W. J.
  • ISSCC 2004/SESSION 7/TD: SCALING TRENDS/7.1 Horowitz, M., Dally, W.
  • ARVLSI’97 Committees Dally, W. J., Brown, R. B., Ishii, A. T., Papaefthymiou, M. C., Mudge, T. N., June, C. S.
  • ISSCC 2007/SESSION 24/MULTI-GB/s TRANSCEIVERS/24.3 Palmer, R., Poulton, J., Dally, W. J., Eyles, J., Fuller, A. M., Greer, T.
  • Globally Adaptive Load-Balanced Routing on k-ary n-cubes Singh, A., Dally, W. J., Towles, B., Gupta, A. K.
  • IEEE Fellows Lead the Engineering Profession Dally, W. J., Agha, G. A., Babic, H. I., Basu, S., Beausoleil, W. F., Bertino, E.
  • 1987 INDEX, VOLUME 4 Dally, W. J., Agrawal, P.
  • 6 Guest Editors’ Introduction: Top Picks from the 2008 Computer Architecture Conferences Joel Emer and Dean Tullsen 10 Larrabee: A Many-Core x86 Architecture Dally, W. J., Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M.
  • 2010 Reviewers List Dally, W. J., Acacio, M. E., Agrawal, N., Altman, E., Alur, R., Baas, B.
  • 5 Guest Editors’ Introduction: Hot Chips 21 Krste Asanovic and Ralph Wittig 7 Power7: IBM’s Next-Generation Server Processor Dally, W. J., Kalla, R., Sinharoy, B., Starke, W. J., Floyd, M., Conway, P.
  • 31st Annual International Symposium on Computer Architecture ISCA 2004 Dally, W. J., Agerwala, T., Taylor, M., Lee, W., Miller, J., Wentzlaff, D.
  • 21st century digital design tools Dally, W. J., Malachowsky, C., Keckler, S. W. 2013
  • A 0.54 pJ/b 20Gb/s ground-referenced single-ended short-haul serial link in 28nm CMOS for advanced packaging applications Solid-State Circuits Conference Digest of Technical Papers (ISSCC) Poulton, J. W., Dally, W. J., Chen, X., Eyles, J. G., Greer, T. H., Tell, S. G. 2013
  • A detailed and flexible cycle-accurate network-on-chip simulator Performance Analysis of Systems and Software (ISPASS) Jiang, N., Becker, D. U., Michelogiannakis, G., Balfour, J., Towles, B., Shaw, D. E. 2013
  • A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications IEEE Poulton, J. W., Dally, W. J., Chen, X., Eyles, J. G., Greer, T. H., Tell, S. G. 2013
  • Composition and reuse with compiled domain-specific languages Dally, W. J., Sujeeth, A. K., Rompf, T., Brown, K. J., Lee, H., Chafi, H. 2013
  • Optimizing data structures in high-level programs: new directions for extensible compilers based on staging Rompf, T., Sujeeth, A. K., Amin, N., Brown, K. J., Jovanovic, V., Lee, H. 2013
  • Channel reservation protocol for over-subscribed channels and destinations Michelogiannakis, G., Jiang, N., Becker, D., Dally, W. J. 2013
  • Article 8-A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors ACM Transactions on Computer Systems-TOCS Gebhart, M., Johnson, D. R., Tarjan, D., Keckler, S. W., Dally, W. J., Lindholm, E. 2012; 2 (30): 38
  • A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware Dally, W. J., Hong, S., Oguntebi, T., Casper, J., Bronson, N., Kozyrakis, C. 2012
  • Digital Design: A Systems Approach Dally, W. J., Harting, R. C. Cambridge University Press. 2012
  • Green-Marl: A DSL for Easy and Efficient Graph Analysis ASPLOS XVII: SEVENTEENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS Hong, S., Chafi, H., Sedlar, E., Olukotun, K. 2012: 349-362
  • Unifying primary cache, scratch, and register file memories in a throughput processor Gebhart, M., Keckler, S. W., Khailany, B., Krashinsky, R., Dally, W. J. 2012
  • 4 Guest Editor’s Introduction: CPUs, GPUs, and Hybrid Computing David Brooks 7 GPUs and the Future of Parallel Computing Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., Glasco, D., Rohr, D. 2011
  • Guaranteeing forward progress of unified register allocation and instruction scheduling Technical Report Concurrent VLSI Architecture Group Memo 127, Stanford Park, J., Dally, W. J. 2011
  • Gpus and the future of parallel computing Micro, IEEE Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., Glasco, D. 2011; 5 (31): 7-17
  • Energy-efficient mechanisms for managing thread context in throughput processors ACM SIGARCH Computer Architecture News Gebhart, M., Johnson, D. R., Tarjan, D., Keckler, S. W., Dally, W. J., Lindholm, E. 2011; 3 (39): 235-246
  • 2011 Index IEEE Computer Architecture Letters Vol. 10 Computer Architecture Letters Becker, D., Choi, I., Cooper-Balis, E., Dally, W. J., Devadas, S., Duato, J. 2011; 53: 56
  • Circuit challenges for future computing systems Dally, W. J. 2011
  • Liszt: a domain specific language for building portable mesh-based PDE solvers DeVito, Z., Joubert, N., Palacios, F., Oakley, S., Medina, M., Barrientos, M. 2011
  • A compile-time managed multi-level register file hierarchy Gebhart, M., Keckler, S. W., Dally, W. J. 2011
  • 2010 IEEE Symposium on Asynchronous Circuits and Systems Dally, W. J., Tell, S. G. 2010
  • Throughput computing Dally, W. J. 2010
  • Evaluating bufferless flow control for on-chip networks Michelogiannakis, G., Sanchez, D., Dally, W. J., Kozyrakis, C. 2010
  • The even/odd synchronizer: A fast, all-digital, periodic synchronizer Asynchronous Circuits and Systems (ASYNC), 2010 IEEE Symposium on Dally, W. J., Tell, S. G. 2010: 75-84
  • Moving the needle, computer architecture research in academe and industry ACM SIGARCH Computer Architecture News Dally, W. J. 2010; 3 (38): 1-1
  • Booksim 2.0 User’s Guide Standford University Jiang, N., Michelogiannakis, G., Becker, D., Towles, B., Dally, W. J. 2010
  • Fine-grain dynamic instruction placement for L0 scratch-pad memory Park, J., Balfour, J., Dally, W. J. 2010
  • Block-Parallel Programming for Real-time Embedded Applications WJ 2010

    View details for DOI D

  • Apparatus and method for packet scheduling US Patent Dally, W. J., Carvey, P. P., Beliveau, P. A., Mann, W. F., Dennison, L. R. 2010; 760 (7): 747
  • The GPU Computing Era (HTML) Nickolls, J., Dally, W. J. 2010
  • The GPU computing era Micro, IEEE Nickolls, J., Dally, W. J. 2010; 2 (30): 56-69
  • The end of denial architecture and the rise of throughput computing Keynote speech at Desgin Automation Conference Dally, W. J. 2010
  • The end of denial architecture and the rise of throughput computing Dally, W. J. 2010
  • Exascale software study: Software challenges in extreme scale systems DARPA IPTO, Air Force Research Labs Amarasinghe, S., Campbell, D., Carlson, W., Chien, A., Dally, W., Elnohazy, E. 2009
  • Indirect adaptive routing on large scale interconnection networks ACM SIGARCH Computer Architecture News Jiang, N., Kim, J., Dally, W. J. 2009; 3 (37): 220-231
  • Router designs for elastic buffer on-chip networks Michelogiannakis, G., Dally, W. J. 2009
  • Power efficient supercomputing Accelerator-based Computing and Manycore Workshop (presentation) Dally, W. J. 2009; 1
  • Allocator implementations for network-on-chip routers Becker, D. U., Dally, J. J. 2009
  • Maximizing the Filter Rate of L0 Compiler-Managed Instruction Stores by Pinning Technical Report 126, Concurrent VLSI Architecture Group, Stanford University Park, J., Balfour, J., Dally, W. J. 2009
  • Stream Processors Multicore Processors and Systems Erez, M., Dally, W. J. 2009: 231-270
  • Load-balanced routing US Patent Singh, A., Dally, W. J. 2009; 633 (7): 940
  • Embracing heterogeneity–parallel programming for changing hardware Linderman, M. D., Balfour, J., Meng, T. H., Dally, W. J. 2009
  • Elastic-buffer flow control for on-chip networks High Performance Computer Architecture Michelogiannakis, G., Balfour, J., Dally, W. J. 2009
  • Hierarchical instruction register organization Computer Architecture Letters Black-Schaffer, D., Balfour, J., Dally, W., Parikh, V., Park, J. S. 2008; 2 (7): 41-44
  • A tuning framework for software-managed memory hierarchies Ren, M., Park, J. Y., Houston, M., Aiken, A., Dally, W. J. 2008
  • An energy-efficient processor architecture for embedded systems Computer Architecture Letters Balfour, J., Dally, W. J., Black-Schaffer, D., Parikh, V., Park, J. S. 2008; 1 (7): 29-32
  • Exascale computing study: Technology challenges in achieving exascale systems Kogge, P., Bergman, K., Borkar, S., Campbell, D., Carson, W., Dally, W. 2008
  • A programmable 512 GOPS stream processor for signal, image, and video processing Solid-State Circuits, IEEE Journal Khailany, B. K., Williams, T., Lin, J., Long, E. P., Rygh, M., Tovey, D. F., Dally, B. 2008; 1 (43): 202-213
  • Structured Application-Specific Integrated Circuit (ASIC) Study STANFORD UNIV CA COMPUTER SYSTEMS LAB Dally, W., Balfour, J., Black-Schaffer, D., Hartke, P. 2008
  • Exascale computing study: Technology challenges in achieving exascale systems Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M. 2008
  • Flattened butterfly: a cost-efficient topology for high-radix networks ACM SIGARCH Computer Architecture News Kim, J., Dally, W. J., Abts, D. 2007; 2 (35): 126-137
  • Research Challenges for On-Chip Interconnection Networks (HTML) Owens, J. D., Dally, W. J., Ho, R., Jayasimha, D. N., Keckler, S. W., Peh, L. S. 2007
  • Executing irregular scientific applications on stream architectures Erez, M., Ahn, J. H., Gummaraju, J., Rosenblum, M., Dally, W. J. 2007
  • A 14mW 6.25 Gb/s transceiver in 90nm CMOS for serial chip-to-chip communications Palmer, R., Poulton, J., Dally, W. J., Eyles, J., Fuller, A. M., Greer, T. 2007
  • Architectural support for the stream execution model on general-purpose processors Gummaraju, J., Erez, M., Coburn, J., Rosenblum, M., Dally, W. J. 2007
  • Stream Scheduling: A Framework to Manage Bulk Operations in a Memory Hierarchy Parallel Architecture and Compilation Techniques Das, A., Dally, W. J. 2007
  • Interconnect-Centric Computing. HPCA Dally, W. J., Keynote, H. 2007; 1
  • Tradeoff between data-, instruction-, and thread-level parallelism in stream processors Ahn, J., Erez, M., Dally, W. J. 2007
  • Future directions for on-chip interconnection networks OCIN Workshop Dally, W. J. 2006
  • Sequoia: programming the memory hierarchy Fatahalian, K., Horn, D., Knight, T., Leem, L., Houston, M., Park, J., Dally, B. 2006
  • Multi-Core for HPC: Breakthrough or Breakdown? Sterling, T., Kogge, P., Dally, W., Scott, S., Gropp, W., Keyes, D. 2006
  • Topology optimization of interconnection networks Computer Architecture Letters Gupta, A. K., Dally, W. J. 2006; 1 (5): 10-13
  • Prefix search method US Patent Waters, G. M., Dennison, L. R., Carvey, P. P., Dally, W. J., Mann, W. F. 2006; 130 (7): 847
  • DRAFT Final Report: Workshop on On-and Off-Chip Networks for Multi-Core Systems Capturado em: http://www. ece. ucdavis. edu/~ ocin06 Dally, W. 2006
  • Compiling for stream processing Das, A., Dally, W. J., Mattson, P. 2006
  • Data parallel address architecture Computer Architecture Letters Ahn, J. H., Dally, W. J. 2006; 1 (5): 30-33
  • Adaptive routing in high-radix clos network Kim, J., Dally, W. J., Dally, J., Abts, D. 2006
  • Pulsenet-A Parallel Flash Sampler and Digital Processor IC for Optical SETI Custom Integrated Circuits Conference, 2006. CICC'06. IEEE Howard, A. W., Wei, G. Y., Dally, W. J., Horowitz, P. 2006: 261-264
  • Design tradeoffs for tiled CMP on-chip networks Balfour, J., Dally, W., J. 2006
  • The design space of data-parallel memory systems Ahn, J. H., Erez, M., Dally, W. J. 2006
  • Fault tolerance techniques for the merrimac streaming supercomputer Erez, M., Jayasena, N., Knight, T. J., Dally, W. J. 2005
  • 11th International Symposium on High-Performance Computer Architecture (HPCA'05) Ahn, J. H., Erez, M., Dally, W. J. 2005
  • Globally adaptive load-balanced routing on tori Computer Architecture Letters Singh, A., Dally, W. J., Towles, B., Gupta, A. K. 2004; 1 (3): 2-2
  • Streams and vectors: A memory system perspective 6th WorkShop on Media and Streaming Processors Jayasena, N., Dally, W. J. 2004
  • High-Speed Logic, Circuits, Libraries and Layout Closing the Gap Between ASIC & Custom Chang, A., Dally, W. J., Chinnery, D., Keutzer, K., Zlatanovici, R. 2004: 101-144
  • The case for broader computer architecture education: keynote address Dally, W. J. 2004
  • Buffer and delay bounds in high radix interconnection networks Computer Architecture Letters Singh, A., Dally, W. J. 2004; 1 (3): 8-8
  • Adaptive channel queue routing on k-ary n-cubes Singh, A., Dally, W., J., Gupta, A., Towles, B. 2004
  • Stream processors: Progammability and efficiency Queue Dally, W. J., Kapasi, U. J., Khailany, B., Ahn, J. H., Da, A. 2004; 1 (2): 52
  • Principles and practices of interconnection networks Access Online via Elsevier Dally, W. J., Towles, B. P. 2004
  • How scaling will change processor architecture Solid-State Circuits Conference, 2004. Digest of Technical Papers. Horowitz, M., Dally, W. 2004
  • Exploiting Structure and Managing Wires to Increase Density and Performance Closing the Gap Between ASIC & Custom Chang, A., Dally, W. J. 2004: 269-287
  • Analysis and performance results of a molecular modeling application on Merrimac Erez, M., Ahn, J. H., Garg, A., Dally, W. J., Darve, E. 2004
  • Space-efficient source routing Carvey, P., Dally, W., Dennison, L., King, P., Mann, W. 2004
  • The Ninth International Symposium on High-Performance Computer Architecture (HPCA'03) Khailany, B., Dally, W. J., Rixner, S., Kapasi, U. J., Owens, J. D., Towles, B. 2003
  • Merrimac: Supercomputing with streams Dally, W., J., Labonte, F., Das, A., Hanrahan, P., Ahn, J. H., Gummaraju, J. 2003
  • Prefix search method Carvey, P., Carvey, P., Dennison, L., Mann, W., Waters, G. 2003
  • A second-order semi-digital clock recovery circuit based on injection locking Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC Lee, M. J., Dally, W. J., Poulton, J., Greer, T., Edmondson, J., Farjad-Rad, R. 2003
  • A 33mW 8Gb/s CMOS clock multiplier and CDR for highly integrated I/Os Ng, H. T., Lee, M. J., Farjad-Rad, R., Senthinathan, R., Dally, W. J., Nguyen, A. 2003
  • Methods and apparatus for event-driven routing Carvey, P., Dally, W., Dennison, L., King, P. 2003
  • 0.622-8.0 Gbps 150 mW serial IO macrocell with fully flexible preemphasis and equalization VLSI Circuits, 2003. Digest of Technical Papers. 2003 Symposium on Farjad-Rad, R., Ng, H. T., Lee, M. J., Senthinathan, R., Dally, W. J., Nguyen, A. 2003: 63-66
  • Throughput-centric routing algorithm design Towles, B., Dally, W. J., Boyd, S. 2003
  • CMOS high-speed I/Os-present and future Lee, M. J., Dally, W. J., Farjad-Rad, R., Ng, H. T., Senthinathan, R., Edmondson, J. 2003
  • Migration in single chip multiprocessors Computer Architecture Letters Shaw, K. A., Dally, W. J. 2002; 1 (1): 12-12
  • Locality-preserving randomized oblivious routing on torus networks Singh, A., Dally, W. J., Towles, B., Gupta, A. K. 2002
  • Comparing Reyes and OpenGL on a stream architecture Owens, J. D., Khailany, B., Towles, B., Dally, W. J. 2002
  • Prefix search circuitry and method Carvey, P., Dally, W., Dennison, L., Mann, W., Waters, G. 2002
  • Internet switch router Carvey, P., Carvey, P., Dennison, L., King, P. 2002
  • Computer architecture is all about interconnect High-Perf. Comp. Architecture Dally, W. J. 2002
  • Worst-case traffic for oblivious routing functions Towles, B., Dally, W. J. 2002
  • Stream Processing for High-Performance Embedded Systems Defense Technical Information Center Dally, W. J. 2002
  • Method and system for guaranteeing quality of service in large capacity input output buffered cell switch based on minimum bandwidth guarantees and weighted fair share of unused bandwidth Dally, W., Meempat, G., Ramamurthy, G. 2002
  • Worst-case Traffic for Oblivious Routing Functions (PDF) Towles, B., Dally, W. J. 2002
  • A 0.2-2 GHz 12 mW multiplying DLL for low-jitter clock synthesis in highly-integrated data communication chips Farjad-Rad, R., Dally, W., Ng, H. T., Poulton, J., Stone, T., Rathi, R. 2002
  • Guest Editors' Introduction: Hot Chips 12 (HTML) Dally, W. J., Tremblay, M., Baum, A. J. 2001
  • Elastic interconnects: Repeater-inserted long wiring capable of compressing and decompressing data Mizuno, M., Dally, W., Onishi, H. 2001
  • Monolithic chaotic communications system Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Chiang, P., Dally, W., Lee, E. 2001
  • Guest Editors' Introduction: Hot Chips 12 IEEE MICRO Baum, A. J., Dally, W. J., Tremblay, M. 2001; 2 (21): 0013-15
  • Scalable switching fabrics for Internet routers White paper, Avici Systems Inc Dally, W. J. 2001
  • A streaming supercomputer Whitepaper Dally, W. J., Hanrahan, P., Fedkiw, R. 2001
  • A single-chip terabit switch Hot Chips Dally, W. J., Dettloff, W., Eyles, J., Greer, T., Poulton, J., Stone, T. 2001; 13
  • A Delay Model for Router Microarchitectures (HTML) Peh, L. S., Dally, W. J. 2001
  • Smart memories: A modular reconfigurable architecture ACM SIGARCH Computer Architecture News Mai, K., Paaske, T., Jayasena, N., Ho, R., Dally, W. J., Horowitz, M. 2000; 2 (28): 161-171
  • Flit-reservation flow control Peh, L., S., Dally, W. J. 2000
  • Stream Scheduling STANFORD UNIV CA COMPUTER SYSTEMS LAB Dally, W. J., Mattson, P., Kapasi, U. J., Owens, J. D., Towles, B. 2000
  • 10 Subspace Optimizations Knobe, K., Dally, W. J. edited by Kessler, Christoph, W. 2000
  • Stream scheduling STANFORD UNIV CA COMPUTER SYSTEMS LAB Kapasi, U. J., Mattson, P., Dally, W. J., Owens, J. D., Towles, B. 2000
  • Sixth International Symposium on High-Performance Computer Architecture Peh, L. S., Dally, W. J. 2000
  • Memory access scheduling isca Owens, J. D., Mattson, P., Kapasi, U. J., Dally, W. J., Rixner, S. 2000; 128
  • Register organization for media processing Rixner, S., Dally, W., J., Khailany, B., Mattson, P., Kapasi, U. J., Owens, J. 2000
  • Polygon rendering on a stream architecture Owens, J. D., Dally, W. J., Kapasi, U. J., Rixner, S., Mattson, P., Mowery, B. 2000
  • A 90 mW 4 Gb/s equalized I/O circuit with input offset cancellation Lee, M. J., Dally, W., Chiang, P. 2000
  • Sixth International Symposium on High-Performance Computer Architecture Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., Owens, J. D. 2000
  • Computer Architecture for the Next Millenium Dally, W. J. 1999
  • GAD: A 12-GS/s CMOS 4-bit A/D converter for an equalized multi-level link Ellersick, W., Yang, C. K., Horowitz, M., Dally, W. J. 1999
  • Interconnect-limited VLSI architecture Interconnect Technology, 1999. IEEE International Conference Dally, W. J. 1999: 15-17
  • 20th Anniversary Conference on Advanced Research in VLSI Dally, W. J., Lacy, S. 1999
  • Point sample rendering Massachusetts Institute of Technology Dally, W. J., Grossman, J. P. 1998
  • VLSI datapath choices: Cell-based versus full-custom Massachusetts Institute of Technology Chang, A. L. 1998
  • Tomorrow’s Computing Engines keynote speech, Fourth Int’l Symp. High-Performance Computer Architecture Dally, W. 1998
  • The j-machine: A retrospective Retrospective in Dally, W. J., Chang, A., Chien, A., Fiske, S., Horwat, W., Keen, J. 1998: 54-58
  • An efficient, protected message interface Computer Lee, W. S., Dally, W. J., Keckler, S. W., Carter, N. P., Chang, A. 1998; 11 (31): 69-75
  • Digital systems engineering Cambridge university press Dally, W. J., Poulton, J. W. 1998
  • Architecture of a message-driven processor 25 years of the international symposia on Computer architecture (selected Dally, W. J., Chao, L., Chien, A., Hassoun, S., Horwat, W., Kaplan, J. 1998
  • Architecture of the Avici terabit switch/router Dally, W., Carvey, P., Dennison, L. 1998
  • Digital Systems Engineering Poulton, J. W., Dally, J., John, W. Cambridge University Press. 1998
  • E cient, protected message interface in the MIT M-Machine IEEE Computer Special Issue on Design Challenges for High-Performance Lee, W. S., Dally, W. J., Keckler, S. W., Carter, N. P., Chang, A. 1998
  • An instruction scheduling algorithm for communication-constrained microprocessors Massachusetts Institute of Technology Dally, W. J., Buehler, C. J. 1998
  • The Fifth International Conference on Massively Parallel Processing Using Optical Interconnections Dally, W. J., Lee, M. J., An, F. T., Poulton, J., Tell, S. 1998
  • Point sample rendering Rendering Techniques Grossman, J. P., Dally, W. J. 1998; 98: 181-192
  • Media Processors 1999 (Proceedings Volume) Dally, W. J., Fritts, J. E., Wolf, W. H., Liu, B., Bove Jr, V. M., Lee, M. 1998
  • Media processing using streams Electronic Imaging Rixner, S., Dally, W. J., Kapasi, U. J., Khailany, B., Lopez-Lagunas, A., Mattson, P. R. 1998: 122-134
  • The J-Machine ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE Dally, W. J., Chang, A., Chien, A., Fiske, S., Horwat, W., Keen, J. 1998; 25: 54-58
  • Retrospective: the J-machine Dally, W. J., Chien, A., Fiske, S., Horwat, W., Lethin, R., Noakes, M. 1998
  • Invited Talks Coldren, L. A., Dally, W. J. 1998
  • Message-driven dynamics Massachusetts Institute of Technology Dally, W. J., Lethin, R. A. 1997
  • Transmitter equalization for 4-Gbps signaling Micro, IEEE Dally, W. J., Poulton, J. 1997; 1 (17): 48-56
  • The m-machine multicomputer International Journal of Parallel Programming Fillo, M., Keckler, S. W., Dally, W. J., Carter, N. P., Chang, A., Gurevich, Y. 1997; 3 (25): 183-212
  • The delta tree: An object-centered approach to image-based rendering Dally, W. J., McMillan, L., Bishop, G., Fuchs, H. 1997
  • Extended ephemeral logging: log storage management for applications with long lived transactions ACM Transactions on Database Systems (TODS) Keen, J. S., Dally, W. J. 1997; 1 (22): 1-42
  • Design of the Configuration and Diagnostic Units of the MAP Chip Massachusetts Institute of Technology Dally, W. J., Klayman, K. 1997
  • An I/O port controller for the MAP chip Massachusetts Institute of Technology, Dept. of Electrical Engineering and Dally, W. J., Ma, A. 1997
  • Asynchronous event handing Massachusetts Institute of Technology Dally, W. J., Chatterjee, S. 1997
  • Advances in the M-machine runtime system Massachusetts Institute of Technology Dally, W. J., Shultz, A. 1997
  • TPDS Now Online! z Special Issue Editors Old and New IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS Dally, W. J., Fortes, J. A. 1997; 3 (8): 225
  • Circuit designs for the MAP chip Massachusetts Institute of Technology Dally, W. J., Chen, A. R. 1997
  • 1997Annual Index, Vol. 17 development [single chip microprocessors] Dally, W. J., Adams, L., Anderson, T., Bilas, A., Biswas, B. B., Burger, D. 1997; 2000: 28-36
  • Flexible Memory Systems.(AASERT Fellowship). MASSACHUSETTS INST OF TECH CAMBRIDGE Carter, N., Dally, W. J. 1996
  • The subspace model: Shape-based compilation for parallel systems Massachusetts Institute of Technology Dally, W. J., Knobe, K. B. 1996
  • Architects Look to Processors of Future MICROPROCESSOR REPORT, MICRODESIGN RESOURCES Bell, G., Sites, R., Dally, W., Ditzel, D., Patt, Y. 1996; 10 (10)
  • Multiprocessor coupling system with integrated compile and run time scheduling for parallelism US Patent Keckler, S. W., Dally, W. J. 1996; 574 (5): 939
  • Bandwidth, Granularity, and Mechanisms: Key Issues in the Design of Parallel Computers Dally, W. J. 1996
  • Flexible Memory Systems.(AASERT Fellowship) MASSACHUSETTS INST OF TECH CAMBRIDGE Dally, W. J., Carter, N. 1996
  • A data-driven IDCT architecture for low power video applications Xanthopoulos, T., Chandrakasan, A. P., Sodini, C. G., Dally, W. J. 1996
  • Evaluating the locality benefits of active messages ACM SIGPLAN Notices Spertus, E., Dally, W. J. 1995; 8 (30): 189-198
  • Thread prioritization: A thread scheduling mechanism for multiple-context parallel processors Future Generation Computer Systems Fiske, S., Dally, W. J. 1995; 6 (11): 503-518
  • The M-Machine Multicomputer MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Dally, W. J., Keckler, S. W., Fillo, M., Carter, N. P., Chang, A. 1995
  • 1st IEEE Symposium on High-Performance Computer Architecture Nuth, P. R., Dally, W. J. 1995
  • Low-latency plesiochronous data retiming Dennison, L. R., Dally, W. J., Xanthopoulos, D. 1995
  • Implementation of atomic primitives on distributed shared memory multiprocessors Dally, W. J., Michael, M. M., Scott, M. L. 1995
  • The M-Machine operating system Massachusetts Institute of Technology Dally, W. J., Gurevich, Y. 1995
  • The subspace model: A theory of shapes for parallel systems Knobe, K., Dally, W. J. 1995
  • Fault tolerant adaptive routing in multicomputer networks Massachusetts Institute of Technology Xanthopoulos, T. 1995
  • The named-state register file: Implementation and performance Nuth, P. R., Dally, W. J. 1995
  • Proceedings Dally, W. J., Poulton, J. W., Ishii, A. T. 1995
  • 1st IEEE Symposium on High-Performance Computer Architecture Fiske, S., Dally, W. J. 1995
  • Issues in the Design and Implementation of Instruction Processors for Multicomputers (Position Statement) Multithreaded Computer Architecture Dally, W. J. 1994: 79-82
  • The implementation of a reliable router chip Massachusetts Institute of Technology Dally, W. J., Kan, K. H. 1994
  • The design of a high performance SPARC bus interface Massachusetts Institute of Technology Dally, W. J., Wong, D. F. 1994
  • Efficient message subsystem design Massachusetts Institute of Technology Dally, W. J., Lee, W. S. 1994
  • VLSI design for freshmen and sophomores Massachusetts Institute of Technology Dally, W. J., Harris, D. 1994
  • Subspace optimizations Automatic Parallelization Knobe, K., Dally, W. J. 1994: 153-176
  • M-Machine Microarchitecture v1. 11 Dally, W. J., Keckler, S. W., Carter, N., Chang, A., Fillo, M., Lee, W. S. 1994
  • Logging and recovery in a highly concurrent database Dally, W. J., Keen, J. S. 1994
  • The reliable router: A reliable and high-performance communication substrate for parallel computers Parallel Computer Routing and Communication Dally, W. J., Dennison, L. R., Harris, D., Kan, K., Xanthopoulos, T. 1994: 241-255
  • Named state and efficient context switching Multithreaded Computer Architecture Nuth, P. R., Dally, W. J. 1994: 201-212
  • Multithreaded computer architecture Boston: Kluwer Academic Publishers Dennis, J. B., Gao, G. R., Iannucii, R. A., Dally, W. J. 1994
  • Architecture and implementation of the Reliable Router Dally, W. J., Dennison, L. R., Harris, D., Kan, K., Xanthopoulos, T. 1994
  • A subspace optimizing data parallel complier Massachusetts Institute of Technology Dally, W. J., Dampier, T. O. 1994
  • A numerical engine for distributed sparse matrices Massachusetts Institute of Technology Dally, W. J., Telichevesky, R. 1994
  • The design and implementation of an actor language based on linear logic Massachusetts Institute of Technology Dally, W. J., Tse, C. S. 1994
  • How to Choose the Grain Size of a Parallel Computer MIT/LCS Technical Report Yeung, D., Dally, W. J., Agarwal, A. 1994: MIT-LCS-TR-739
  • XEL: extended ephemeral logging for log storage management Keen, J. S., Dally, W. J. 1994
  • Hardware support for fast capability-based addressing ACM SIGPLAN Notices Carter, N. P., Keckler, S. W., Dally, W. J. 1994; 11 (29): 319-327
  • Deadlock-free adaptive routing in multicomputer networks using virtual channels Parallel and Distributed Systems, IEEE Transactions Dally, W. J., Aoki, H. 1993; 4 (4): 466-475
  • The J-machine multicomputer: an architectural evaluation ACM SIGARCH Computer Architecture News Noakes, M. D., Wallach, D. A., Dally, W. J. 1993; 2 (21): 224-235
  • Performance evaluation of ephemeral logging ACM SIGMOD Record Keen, J. S., Dally, W. J. 1993; 2 (22): 187-196
  • Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5 ACM SIGARCH Computer Architecture News Spertus, E., Goldstein, S. C., Schauser, K. E., Eicken, T. V., Culler, D. E., Dally, W. J. 1993; 3 (21): 302-313
  • COSMOS: An operating system for a fine-grain concurrent computer Research directions in concurrent object-oriented programming Horwat, W., Totty, B., Dally, W. J. 1993: 452-476
  • The J-Machine architecture and evaluation Compcon Spring'93, Digest of Papers. Dally, W. J., Keen, J. S., Noakes, M. D. 1993: 183-188
  • Message-driven processor in a concurrent computer US Patent Dally, W. J., Chien, A. A., Horwat, W. P., Fiske, S. 1993; 212 (5): 778
  • A Video Controller and Distributed Frame Bu er for the J-Machine Dally, W. J., McDonald, E. 1993
  • A universal parallel computer architecture New Generation Computing Dally, W. J. 1993; 3-4 (11): 227-249
  • High-performance bidirectional signalling in VLSI systems Dennison, L. R., Lee, W. S., Dally, W. J. 1993
  • Mechanisms for parallel computers Parallel Computing on Distributed Memory Multiprocessors Dally, W. J., Wills, D. S., Lethin, R. 1993: 3-25
  • The Future of Computing is Parallel Computer Science Department Dally, W. J. 1993
  • The J-machine: a fine-grain parallel computer Computing Systems in Engineering Dally, W. J., Chien, A., Davison, R., Fiske, J. A., Furman, S., Fyler, G. 1992; 1 (3): 7-15
  • Design and implementation of the Message-Driven Processor Dally, W. J., Ahmed, S., Carrick, P., Chien, A., Davison, R., Fiske, J. 1992
  • The message-driven processor: A multicomputer processing node with efficient mechanisms Micro, IEEE Dally, W. J., Fiske, J. A., Keen, J. S., Lethin, R. A., Noakes, M. D., Nuth, P. R. 1992; 2 (12): 23-39
  • The message driven processor: An integrated multicomputer processing element Computer Design: VLSI in Computers and Processor Dally, W. J., Chien, A., Fiske, J. A., Fyler, G., Horwat, W., Keen, J. S. 1992
  • Processor coupling: Integrating compile time and runtime scheduling for parallelism ACM SIGARCH Computer Architecture News Keckler, S. W., Dally, W. J. 1992; 2 (20): 202-213
  • INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE 1992 Scientific information bulletin Keckler, S. W., Dally, W. J. 1992; 4 (17): 35
  • Custom integrated circuits Custom Integrated Circuits Dally, W. J., Allen, J., Wyatt Jr, J. L., White, J. K., Devadas, S., Armstrong, R. C. 1992
  • A fast translation method for paging on top of segmentation Computers, IEEE Transactions Dally, W. J. 1992; 2 (41): 247-250
  • MDP design tools and methods Computer Design: VLSI in Computers and Processors Lethin, R. A., Dally, W. J. 1992: ICCD'92
  • Virtual-Channel Flow Control (PDF) Dally, W. J. 1992
  • The J-machine network Computer Design: VLSI in Computers and Processors Nuth, P. R., Dally, W. J. 1992
  • Pi: a parallel architecture interface Frontiers of Massively Parallel Computation, 1992., Fourth Symposium on the… Wills, D. S., Dally, W. J. 1992
  • Virtual-channel flow control Parallel and Distributed Systems, IEEE Transactions Dally, W. J. 1992; 2 (3): 194-205
  • Experiences Implementing Dataflow on a General-Purpose Parallel Computer. ICPP Spertus, E., Dally, W. J. 1991; 2: 231-235
  • A mechanism for efficient context switching Computer Design: VLSI in Computers and Processors Nuth, P. R., Dally, W. J. 1991: ICCD'91
  • Express cubes: improving the performance of< e1> k</e1>-ary< e1> n</e1>-cube interconnection networks Computers, IEEE Transactions Dally, W. J. 1991; 9 (40): 1016-1023
  • Experiments with Dataflow on a General-Purpose Parallel Computer. MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Spertus, E., Dally, W. J. 1991
  • Experiments with data flow on a general-purpose parallel computer. Memorandum report Massachusetts Inst. of Tech., Cambridge, MA (United States). Artificial Spertus, E., Dally, W. J. 1991
  • Experiments with Dataflow on a General-Purpose Parallel Computer MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Dally, W. J., Spertus, E. 1991
  • System design of the J-Machine Noakes, M., Dally, W. J. 1990
  • Experience with concurrent aggregates (CA): Implementation and programming Chien, A. A., Dally, W. J. 1990
  • Advanced Research in VLSI: Proceedings of the Sixth MIT Conference;[papers Presented at the Sixth MIT Conference on Advanced Research in VLSI, Held in Cambridge, Mass., in 1990] Da, W. J. 1990
  • The Message-Driven Processor: A Multicomputer Processing Node with E cient Mechanisms Dally, W. J., Davison, R., Fiske, J. A., Fyler, G., Keen, J. S., Lethin, R. A. 1990
  • Performance analysis of< e1> k</e1>-ary< e1> n</e1>-cube interconnection networks Computers, IEEE Transactions Dally, W. J. 1990; 6 (39): 775-785
  • Network and processor architecture for message-driven computers VLSI and Parallel Computation Dally, W. 1990: 140-222
  • Critical Problems in Very Large Scale Computer Systems MASSACHUSETTS INST OF TECH CAMBRIDGE Agarwal, A., Dally, W. J., Devadas, S., Knight Jr, T. F., Leighton, F. T., Nabors, K. 1990
  • Concurrent aggregates (CA) ACM Sigplan Notices Chien, A. A., Dally, W. J. 1990; 3 (25): 187-196
  • Virtual-channel flow control Dally, W., J. 1990
  • Proceedings of the sixth MIT conference on Advanced research in VLSI Dally, W. J. 1990
  • Simultaneous bidirectional signalling for IC systems Computer Design: VLSI in Computers and Processors Lam, K., Dennison, L. R., Dally, W. J. 1990: ICCD'90
  • Critical Problems in Very Large Scale Computer Systems KURTZ LABS YELLOW SPRINGS OH Leighton, F. T., Knight, T. F., Agarwal, A., Dally, W. J., Devadas, S. 1990
  • A hardware logic simulation system Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions Agrawal, P., Dally, W. J. 1990
  • Express cubes: Improving the performance of k-ary n-cube interconnection networks MASSACHUSETTS INST OF TECH CAMBRIDGE LAB FOR COMPUTER SCIENCE Dally, W. J. 1989
  • Algorithms for accuracy enhancement in a hardware logic simulator Agrawal, P., Tutundjian, R., Dally, W. 1989
  • Universal mechanisms for concurrency PARLE'89 Parallel Architectures and Languages Europe Dally, W. J., Wills, D. S. 1989: 19-33
  • Experience with CST: Programming and Implementation MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chien, A. A., Dally, W. J., Horwat, W. 1989
  • A fine-grain, message-passing processing node Concurrent Computations Dally, W. J. 1989: 375-389
  • The J-machine: a fine grain concurrent computer MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J., Chien, A., Fiske, S., Horwat, W., Keen, J. 1989
  • Micro-optimization of floating-point operations ACM SIGARCH Computer Architecture News Dally, W. J. 1989; 2 (17): 283-289
  • Experience with CST: Programming and implementation ACM SIGPLAN Notices Horwat, W., Chien, A. A., Dally, W. J. 1989; 7 (24): 101-109
  • A network element based fault tolerant processor Massachusetts Institute of Technology Abler, T. A. 1988
  • Finite-grain message passing concurrent computers Dally, W. 1988
  • The J-machine: System support for Actors MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J. 1988
  • ON FIFTH GENERATION COMPUTER SYSTEMS 1988, edited by ICOT.© ICOT, 1988 Dally, W. J. 1988; 3 (FGCS'88): 154
  • Object-Oriented Concurrent Programming in CST MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chien, A. A., Dally, W. J. 1988
  • Message-Driven Processor architecture, Version 11. Artificial intelligence memo Massachusetts Inst. of Tech., Cambridge (USA). Artificial Intelligence Lab. Dally, W., Chien, A., Fiske, S., Horwat, W., Keen, J. 1988
  • Message-Driven Processor Architecture MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W., Chien, A., Fiske, S., Horwat, W., Keen, J. 1988
  • Critical Problems in Very Large Scale Computer Systems MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Knight, T. F., Penfield, P., Glasser, L. A., Agarwal, A., Dally, W. J. 1988
  • Critical problems in very-large-scale computer systems. Semiannual technical report, 1 April-30 September 1988 Massachusetts Inst. of Tech., Cambridge (USA). Microsystems Research Center Penfield, P., Agarwal, A., Dally, W. J., Devadas, S., Knight, T. F. 1988
  • Object-oriented concurrent programming in CST Dally, W. J., Chien, A. A. 1988
  • The reconfigurable arithmetic processor ACM SIGARCH Computer Architecture News Fiske, S., Dally, W. J. 1988; 2 (16): 30-36
  • Mechanisms for Concurrent Computing FGCS Dally, W. J. 1988: 154-156
  • The Reconfigurable Arithmetic Processor MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J., Fiske, S. 1988
  • The Balanced Cube A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 27-73
  • Architecture and design of the MARS hardware accelerator Agrawal, P., Dally, W. J., Ezzat, A. K., Fischer, W. C., Jagadish, H. V., Krishnakumar, A. 1987
  • Performance analysis of k-ary n-cube interconnection networks NASA STI/Recon Technical Report N Dally, W. J. 1987; 88: 30010
  • MARS: A multiprocessor-based programmable accelerator Design & Test of Computers, IEEE Agrawal, P., Dally, W. J., Fischer, W. C., Jagadish, H. V., Krishnakumar, A. S., Tutundjian, R. 1987; 5 (4): 28-36
  • Graph Algorithms A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 75-132
  • Deadlock-free message routing in multiprocessor interconnection networks Computers, IEEE Transactions Dally, W. J., Seitz, C. L. 1987; 5 (100): 547-553
  • A coherent VLSI environment Massachusetts Inst. of Tech. Report Penfield Jr, P., Dally, W. J., Glasser, L. A., Knight Jr, T. F., Leighton, F. T. 1987
  • Concurrent Smalltalk A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 13-25
  • A message passing system for a fault tolerant parallel processor Massachusetts Institute of Technology Dally, W. J., Heyda, R. L. 1987
  • A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE Abelson, H., Penfield, P., Antoniadis, D. A., Dally, W. J., Fonstad, C. G. 1987
  • Design of a self-timed VLSI multicomputer communication controller NASA STI/Recon Technical Report Dally, W. J., Song, P. 1987; 88: 30014
  • Coherent VLSI environment. Semiannual technical report, 1 October 1986-31 March 1987 Massachusetts Inst. of Tech., Cambridge (USA). Microsystems Research Center Penfield, P., Dally, W. J., Glasser, L. A., Knight, T. F., Leighton, F. T. 1987
  • A coherent VLSI design environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Penfield Jr, P., Dally, W. J., Glasser, L. A., Knight Jr, T. F., Leighton, F. T., Wyatt Jr, J. L. 1987
  • Architecture of a Message-Driven Processor MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chao, L., Dally, W. J., Chien, A., Hassoun, S., Horwat, W. 1987
  • A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Leighton, F. T., Penfield, P., Glasser, L. A., Knight, T. F., Dally, W. J. 1987
  • Concurrent computer architecture Massachusetts Inst. of Tech., Cambridge (USA). Artificial Intelligence Lab. Dally, W. J. 1987
  • The torus routing chip Distributed computing Dally, W. J., Seitz, C. L. 1986; 4 (1): 187-196
  • On the Performance of k-ary n-cube Interconnection Networks California Institute of Technology Dally, W. J. 1986
  • 5208: TR: _86 Dally, W. J. 1986
  • The torus routine chip Dally, W. J., Seitz, C. L. 1986
  • A High-performance VLSI Quaternary Serial Multiplier Dally, W. J. 1986
  • Wire-efficient VLSI multiprocessor communication networks Massachusetts Institute of Technology, Microsystems Program Office Dally, W. J. 1986
  • Directions in concurrent computing Dally, W. J. 1986
  • A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Leiserson, C. E., Penfield, P., Glasser, L. A., Knight, T. F., Dally, W. J. 1986
  • VLSI architecture for concurrent data structures California Inst. of Tech. Dally, W. J. 1986
  • Concurrent Algorithms for the Max-Flow Problem California Institute of Technology Dally, W. J. 1985
  • A hardware architecture for switch-level simulation Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions Dally, W. J., Bryant, R. E. 1985
  • The balanced cube: a concurrent data structure California Institute of Technology Dally, W. J., Seitz, C. L. 1985
  • Fungicides for Crop Protection: Invited papers International Specialized Book Service Incorporated Dally, W. J., Smith, I. M. 1985
  • An object oriented architecture ACM SIGARCH Computer Architecture News Dally, W. J., Kajiya, J. T. 1985; 3 (13): 154-161
  • The MOSSIM Simulation Engine Architecture and Design California Institute of Technology Dally, W. J. 1984
  • A Special Purpose Processor for Switch-Level Simulation International Conference on Computer Aided Design Dally, W. J., Bryant, R. E. 1984