This list of projects are meant for student and other up-and-coming developers interested in contributing to participating Universiy of California-based open source projects. The list below notes the UC-based open source project and ideas associated to each that were submitted by a UC-affiliated mentor. If you have any questions, please visit our Gitter channel:
Table of Contents:
Projects for LiveHD. Lead Mentors: Jose Renau and Sheng-Hong Wang
Title | HIF tooling |
Description | Tools around Hardware Interchange Format (HIF) files |
Mentor(s) | Jose Renau |
Skills | C++17 |
Difficulty | Medium |
Size | Medium 175 hours |
Link |
HIF (https://github.com/masc-ucsc/hif) stands for Hardware Interchange Format. It is designed to be a efficient binary representation with simple API that allows to have generic graph and tree representations commonly used by hardware tools. It is not designer to be a universal format, but rather a storate and traversal format for hardware tools.
LiveHD has 2 HIF interfaces, the tree (LNAST) and the graph (Lgraph). Both can read/write HIF format. The idea of this project is to expand the hif repository to create some small but useful tools around hif. Some projects:
hif_diff + hif_patch: Create the equivalent of the diff/patch commands that exist for text but for HIF files. Since the HIF files have a more clear structure, some patches changes are more constrained or better understood (IOs and dependences are explicit).
hif_tree: Print the HIF hierarchy, somewhat similar to GNU tree but showing the HIF hieararchy.
hif_grep: capacity to grep for some tokens and outout a hif file only with those. Thena hif_tree/hif_cat can show the contents.
Title | Mockturtle |
Description | Perform synthesis for graph in LiveHD using Mockturtle |
Mentor(s) | Jose Renau |
Skills | C++17, synthesis |
Difficulty | Medium |
Size | Medium 175 hours |
Link |
There are some issues with Mockturtle integration (new cells) and it is not using the latest Mockturtle library versions. The goal is to use Mockturtle (https://github.com/lsils/mockturtle) with LiveHD. The main characteristics:
Title | Query Shell |
Description | Create a console app that interacts with LiveHD to query parameters about designs |
Mentor(s) | Jose Renau |
Skills | C++17 |
Difficulty | Medium |
Size | Medium 175 hours |
Link |
Wavedrom and duh allows to dump bitfield information for structures. It would be interesting to explore to dump tables and bit fields for Lgraph IOs, and structs/fields inside the module. It may be a way to integrate with the documentation generation.
Example of queries: show path, show driver/sink of, do topo traversal,….
As an interesting extension would be to have some simple embedded language (TCL or ChaiScript or ???) to control queries more easily and allow to build functions/libraries.
Title | Lgraph and LNAST check pass |
Description | Create a pass that check the integrity/correctness of Lgraph and LNAST |
Mentor(s) | Jose Renau |
Skills | C++17 |
Difficulty | Medium |
Size | Large 350 hours |
Link |
Create a pass that checks that the Lgraph (and/or LNAST) is semantically correct. The LNAST already has quite a few tests (pass.semantic), but it can be further expanded. Some checks:
Title | unbitwidth |
Description | Not all the variables need bitwidth information. Find the small subset |
Mentor(s) | Jose Renau |
Skills | C++17 |
Difficulty | Medium |
Size | Medium 175 hours |
Link |
This pass is needed to create less verbose CHISEL and Pyrope code generation.
The LGraph can have bitwidth information for each dpin. This is needed for Verilog code generation, but not needed for Pyrope or CHISEL. CHISEL can perform local bitwidth inference and Pyrope can perform global bitwidth inference.
A new pass should remove redundant bitwidth information. The information is redundant because the pass/bitwidth can regenerate it if there is enough details. The goal is to create a pass/unbitwidth that removes either local or global bitwidth. The information left should be enough for the bitwidth pass to regenerate it.
Local bitwidth: It is possible to leave the bitwidth information in many places and it will have the same results, but for CHISEL the inputs should be sized. The storage (memories/flops) should have bitwidth when can not be inferred from the inputs.
Global bitwidth: Pyrope bitwidth inference goes across the call hierarchy. This means that a module could have no bitwidth information at all. We start from the leave nodes. If all the bits can be inferred given the inputs, the module should have no bitwidth. In that case the bitwidth can be inferred from outside.
As storage devices get faster, data management tasks rob the host of CPU cycles and main memory bandwidth. The Eusocial project aims to create a new interface to storage devices that can leverage existing and new CPU and main memory resources to take over data management tasks like availability, recovery, and migrations. The project refers to these storage devices as “eusocial” because we are inspired by eusocial insects like ants, termites, and bees, which as individuals are primitive but collectively accomplish amazing things.
Recent research reveals that the compaction process in RocksDB can be altered to optimize future data access by changing the data layout in compaction levels. The benefit of this approach can be extended to different data layout optimization based on application access patterns and requirements. In this project, we want to create an interface that would allow users to dynamically inject layout optimization functions to RockDB, using containerization technologies such as Webassembly.
Since the last decade, the slowing down in the performance improvement of general-purpose processors is driving the system architecture to be increasingly heterogeneous. We have seen the kinds of domain-specific accelerator hardware (e.g., FPAG, SmartNIC, TPU, GPU) are growing to take over many different jobs from the general-purpose processors. On the other hand, the network and storage device performance have been tremendously improved with a trajectory much outweighed than that of processors. With this trend, a natural thought to continuously scale the storage system performance economically is to efficiently utilize and share different sources from different nodes over the network. There already exist different resource sharing protocols like CCIX, CXL, and GEN-Z. Among these GEN-Z is the most interesting because, unlike RDMA, it enables remote memory accessing without exposing details to applications (i.e., not application changes). Therefore, it would be interesting to see how/whether these technologies can help improve the performance of storage systems, and to what extent. This project would require building a demo system that uses some of these technologies (especially GEN-Z) and run selected applications/workloads to better understand the benefits.
Goal: Determine the benefits in particular market verticals such as genomics and health care to converge the storage stack in data center computer systems to the NVMe device interface, even when devices include rotational media (aka disk drives). The key question: “When do people abandon SATA and SAS and converge to NVMe?”
Background: NVMe is a widely used device interface for fast storage devices such as flash that behave much more like random access memory than the traditional rotational media. Rotational media is accessed mostly via SATA and SAS which has served the industry well for close to two decades. SATA in particular is much cheaper than NVMe. Now that NVMe is widely available and quickly advancing in functionality, an interesting question is whether there is a market for rotational media devices with NVMe interfaces, converging the storage stack to only one logical device interface, thereby enabling a common ecosystem and more efficient connectivity from multiple processes to storage devices.
The NVMe 2.0 specification, which came out last year, has been restructured to support the increasingly diverse NVMe device environment (including rotational media). The extensibility of 2.0 encourages enhancements of independent command sets such as Zoned Namespaces (ZNS) and Key Value (NVMe-KV) while supporting transport protocols for NVMe over Fabrics (NVMe-oF). A lot of creative energy is now focused on advancing NVMe while SATA has not changed in 16 years. Having all storage devices connect the same way not only frees up space on motherboards but also enables new ways to manage drives, for example via NVMe-oF that allows drives to be networked without additional abstraction layers.
Suggested Project Structure: This is really just a suggestion for a starting point. As research progresses, a better structure might emerge.
Project tasks:
Interesting links:
https://www.opencompute.org/wiki/Storage/NVMeHDD
https://2021ocpglobal.fnvirtual.app/a/event/1714 (video and slides, requires $0 registration)
https://www.storagereview.com/news/nvme-hdd-edges-closer-to-reality
https://www.tomshardware.com/news/seagate-demonstrates-hdd-with-pcie-nvme-interface
https://nvmexpress.org/everything-you-need-to-know-about-the-nvme-2-0-specifications-and-new-technical-proposals/
https://www.tomshardware.com/news/nvme-2-0-supports-hard-disk-drives
SmartNICs have become one of important components in heterogeneous system architectures for offloading computing to accelerate particular host functions that mostly found in networking, storage, security, and management services. Generally speaking, a SmartNIC is programmable, and therefore it is “smart.” Though the programmability can have different definitions or implementations by different vendors, the hardware blocks of a SmartNIC that enables programmability are highly specialized (e.g., compression engine and encryption engine). Therefore, SmartNICs can be more efficient than host CPUs on particular jobs. The fundamental problem for SmartNICs for applications is how we can gain performance benefits by leveraging this new hardware.
Apache Arrow
, RPC
, networking
, RDMA
, dpdk
Apache Arrow provides a set of tools to allow efficient in-memory analytics/processing columnar data. Arrow Flight is a data transport framework for streaming Arrow data across different services in a cluster. It is built based on gRPC, Google’s popular HTTP/2-based RPC library. The most common deployment of Arrow Flight is to run on top of the kernel TCP/IP stack. This kernel stack is known to be much slower than kernel-bypass stacks such as DPDK and RDMA. Given that there are already solutions for RPC over these kernel-bypass stacks (e.g., eRPC, and Seastar), the goal of this project is to allow Arrow Flight to leverage the higher performance of these stacks instead of the traditional gRPC.
SmartNIC
, regex
, networking
, Apache Arrow
The current implementation of the row-based data filtering in Apache Arrow uses host CPUs. Offloading this filtering function to SmartNICs can save the host CPU cycles and return to user applications while potentially gaining better performance. [BlueField-2 DPU] (https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/documents/datasheet-nvidia-bluefield-2-dpu.pdf) provides a built-in RegEx accelerator for network Ethernet packets. This project is to develop the connection to allow Apache Arrow to leverage this accelerator, and eventually characterize the performance benefits in throughput, latency, and energy consumption.
The OSAVC is a vehicle-agnostic open source hardware and software project. This project is designed to provide a real-time hardware controller adaptable to any vehicle type, suitable for aerial, terrestrial, marine, or extraterrestrial vehicles. It allows control researchers to develop state estimation algorithms, sensor calibration algorithms, and vehicle control models in a modular fashion such that once the hardware set has been developed switching algorithms requires only modifying one C function and recompiling.
Lead mentor: Aaron Hunter
Projects for the OSAVC:
Help develop a sensor library for use in autonomnous vehicles. Possible sensors include range finders, ping sensors, IMUs, GPS receivers, RC receivers, barometers, air speed sensors, etc. Code will be written in C using state machine methodology and non-blocking algorithms. Test the drivers on a Microchip microncontroller.
Use OpenCV to identify a track for an autonomous vehicle to follow. Build on previous work by developing a new model using EfficientDet and an existing training set of images. Port the model to TFlite and implement on the Coral USB Accelerator. Evaluate its performance against our previous efforts.
Implement an optimal state estimation algorithm from a model. This model can be derived from a Kalman filter or some other state estimation filter (e.g., Mahoney filter). THe model takes sensor readings as input and provides an estimate of the state of a vehicle. Finally, convert the model to standard C using the Simulink code generation or implement in Python (for use on a single board computer, e.g., Raspberry Pi)
The Skyhook Data Management project extends object storage with data management functionality for tabular data. SkyhookDM enables storing and query tabular data in the Ceph distributed object storage system. It thereby turns Ceph into an Apache Arrow-native storage system, utilizing the Arrow Dataset API to store and query data with server-side data processing, including selection and projection that can significantly reduce the data returned to the client.
SkyhookDM is now part of Apache Arrow (see blog post).
Arrow
, Dask/Ray
Problem: Dask and Ray are parallel-computing frameworks similar to Apache Spark but in a Python ecosystem. Each of these frameworks support reading tabular data from different data sources such as a local filesystem, cloud object stores, etc. These systems have recently added support for the Arrow Dataset API to read data from different sources. Since, the Arrow dataset API supports Skyhook, we can leverage this capability to offload compute-heavy Parquet file decoding and decompression into the Ceph storage layer. This can help us speed up the queries significantly as CPU will get freed up in the Dask/Ray workers for other processing tasks.
Arrow
, Gandiva
, SIMD
Problem: Gandiva allows efficient evaluation of query expressions using runtime code generation using LLVM. The generated code leverages SIMD instructions and is highly optimized for parallel processing in modern CPUs. It is natively supported by Arrow for compiling and executing expressions. SkyhookDM currently uses the Arrow Dataset API (which internally uses Arrow Compute APIs) to execute query expressions inside the Ceph OSDs. Since, the Arrow Dataset API particularly does not support Gandiva currently, the goal of this project is to add support for Gandiva in the Arrow Dataset API in order to accelerate query processing when offloaded to the storage layer. This will help Skyhook combat some of the peformance issues due to the inefficient serialization interface of Arrow.
References:
Arrow
, Database views
, virtual datasets
Problem - Workloads may repeat the same or similar queries over time. This causes repetition of IO and compute operations, wasting resources. Saving previous computation in the form of materialized views can provide benefit for future workload processing. Solution - Add a method to the Dataset API to create views from queries and save the view as an object in a separate pool with some object key that can be generated from the query that created it.
Reference: https://docs.dremio.com/working-with-datasets/virtual-datasets.html
data lakes
, lake house
, distributed query processing
Delta Lake is a new architecture for querying big data lakes through Spark, providing transactions. An important benefit of this integration will be to provide an SQL interface for SkyhookDM functionality, through Spark SQL. This project will further build upon our current work connecting Spark to SkyhookDM through the Arrow Dataset API. This would allow us to run some of the TPC-DS queries (popular set of SQL queries for benchmarking databases) on SkyhookDM easily.
Reference: [Delta Lake paper] (https://databricks.com/jp/wp-content/uploads/2020/08/p975-armbrust.pdf)
Proactive Data Containers (PDC) are containers within a locus of storage (memory, NVRAM, disk, etc.) that store science data in an object-oriented manner. Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning
Python
, object-centric data management
, PDC
Proactive Data Containers (PDC) is an object-centric data management system for scientific data on high performance computing systems. It manages objects and their associated metadata within a locus of storage (memory, NVRAM, disk, etc.). Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning. Currently PDC has a C interface. Providing a python interface would make it easier for more Python applications to utilize it.
HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections.
The HDF5 technology suite includes:
Python
, Async I/O
, HDF5
HDF5 is a well-known library for storing and accessing (known as “Input and Output” or I/O) data on high-performance computing systems. Recently, new technologies, such as asynchronous I/O and caching, have been developed to utilize fast memory and storage devices and to hide the I/O latency. Applications can take advantage of an asynchronous interface by scheduling I/O as early as possible and overlapping computation with I/O operations to improve overall performance. The existing HDF5 asynchronous I/O feature supports the C/C++ interface. This project involves the development and performance evaluation of a Python interface that would allow more Python-based scientific codes to use and benefit from the asynchronous I/O.
CephFS is a distributed file system on top of Ceph. It is implemented as a distributed metadata service (MDS) that uses dynamic subtree balancing to trade parallelism for locality during a continually changing workloads. Clients that mount a CephFS file system connect to the MDS and acquire capabilities as they traverse the file namespace. Capabilities not only convey metadata but can also implement strong consistency semantics by granting and revoking the ability of clients to cache data locally.
Ceph
, filesystems
, metadata
, programmable storage
The frequency of metadata service (MDS) requests relative to the amount of data accessed can severely affect the performance of distributed file systems like CephFS, especially for workloads that randomly access a large number of small files as is commonly the case for machine learning workloads: they purposefully randomize access for training and evaluation to prevent overfitting. The datasets of these workloads are read-only and therefore do not require strong coherence mechanisms that metadata services provide by default.
The key idea of this project is to reduce the frequency of MDS requests by offloading namespace traversal, i.e. the need to open a directory, list its entries, open each subdirectory, etc. Each of these operations usually require a separate MDS request. Offloading namespace traversal refers to a client’s ability to request the metadata (and associated read-only capabilities) of an entire subtree with one request, thereby offloading the traversal work for tree discovery to the MDS.
Once the basic functionality is implemented, this project can be expanded to address optimization opportunities, e.g. describing regular tree structures as a closed form expression in the tree’s root, shortcutting tree discovery.
OpenROAD is a front-runner in open-source semiconductor design automation tools and know-how. OpenROAD reduces barriers of access and tool costs to democratize system and product innovation in silicon. The OpenROAD tool and flow provide an autonomous, no-human-in-the-loop, 24-hour RTL-GDSII capability to support low-overhead design exploration and implementation through tapeout. We welcome a diverse community of designers, researchers, enthusiasts and entrepreneurs who use and contribute to OpenROAD to make a far-reaching impact. Our mission is to democratize and advance design automation of semiconductor devices through leadership, innovation, and collaboration.
OpenROAD is the key enabler of successful Chip initiatives like the Google-sponsored Efabless that has made possible more than 150 successful tapeouts by a diverse and global user community. The OpenROAD project repository is https://github.com/The-OpenROAD-Project/OpenROAD.
Design of static RAMs in VLSI designs for good performance and area is generally time-consuming. Memory compilers significantly reduce design time for complex analog and mixed-signal designs by allowing designers to explore, verify and configure multiple variants and hence select a design that is optimal for area and performance. This project requires the support of memory compilers to OpenROAD-flow-scripts based on popular PDKS such as those provided by OpenRAM.
Memory Compilers
, OpenRAM
, Programmable RAM
Improve and verify OpenLane design planning with OpenRAM memories. Specifically, this project will utilize the macro placer/floorplanner and resolve any issues for memory placement. Issues that will need to be addressed may include power supply connectivity, ability to rotate memory macros, and solving pin-access issues.
Memory Compilers
, OpenRAM
, Programmable RAM
Improve and verify OpenLane Static Timing Analysis using OpenRAM generated library files. Specifically, this will include verifying setup/hold conditions as well as creating additional checks such as minimum period, minimum pulse width, etc. Also, the project will add timing information to Verilog behavioral model.
Memory Compilers
, OpenRAM
, Programmable RAM
Integrate and verify FreePDK45 OpenRAM memories with an OpenLane FreePDK45 design flow. OpenLane currently supports only Skywater 130nm PDK, but OpenROAD supports FreePDK45 (which is the same as Nangate45). This project will create a design using OpenRAM memories with the OpenLane flow using FreePDK45.
Power Planning for VLSI
, IR Drop Analysis
, Power grid Creation and Analysis
Take the existing power planning (pdngen.tcl) module of openroad and recode the functionality in C++ ensuring that all of the unit tests on the existing code pass correctly. Work with a senior member of the team at ARM. Ensure that designs created are of good quality for power routing and overall power consumption.
Demo Development
, Documentation
, VLSI design basics
For OpenLane, develop demos showing: The OpenLane flow and highight key features GUI visualizations Design Explorations and Experiments Different design styles and particular challenges
Testing
, Documentation
, VLSI design basics
Develop detailed test plans to test the OpenLane flow to expand coverage and advanced features. Add open source designs to the regression test suite to improve tool quality and robustness. This includes design specification, configuration and creation of all necessary files for regression testing. Suggested sources : ICCAS benchmarks, opencores, LSOracle for synthesis flow option.
GUI
, Visualization
, User Interfaces
For OpenROAD, develop and enhance visualizations for EDA data and algorithms in the OpenROAD GUI. Allow deeper understanding of the tool results for users and tool internals for developers.
Database
, EDA
For OpenROAD- Automatic code generation for the OpenDB database which allows improvements to the data model with much less hand coding. Allow the generation of storage, serialization, and callback code from a custom schema description format. r
AI
, ML
, Analytics
The OpenROAD project contains a storehouse of knowledge in it’s Github repositories within Issues and Pull requests. Additionally, project related slack channels also hold useful information in the form of questions and answers, problems and solutions in conversation threads. Implement an AI analytics bot that filters, selects relevant discussions and classifies/records them into useful documentation and actionable issues. This should also directly track, increase project usage and report outcome metrics.
Projects related to reproducibility and package management, especially as it relates to store type package managers (NixOS, Guix or Spack).
Lead Mentor: Farid Zakaria fmzakari@ucsc.edu
Operating Systems
Compilers
Linux
Package Management
NixOS
Dynamic linking as specified in the ELF file format has gone unchallenged since it’s invention. With many new package management models that eschew the filesystem hierarchy standard (i.e. Nix, Guix and Spack), many of the idiosyncrasies that define the way in which libraries are discovered are no longer useful and potentially harmful.
Specific tasks:
AsterixDB is an open source parallel big-data management system [http://asterixdb.apache.org/]. AsterixDB is a well-established Apache project that has been active in research for more than 10 years. It provides a flexible data model that supports modern NoSQL applications with a powerful query processor that can scale to billions of records and terabytes of data. Users can interact with AsterixDB through a power and easy to use declarative query language, SQL++, which provides a rich set of data types including timestamps, time intervals, text, and geospatial, in addition to traditional numerical and Boolean data types.
Build a data science project using AsterixDB that analyzes geospatial data among other dimensions. Use Chicago Crimes as the main dataset and combine with other datasets including points of interests ZIP Code boundaries. During this project, we will answer interesting questions about the data and visualize the results such as:
As a bonus task, and depending on the progress of the project, we can explore the integration of machine learning with AsterixDB through Python UDFs. We will utilize the AsterixDB Python integration through user-defined functions to connect AsterixDB backend with scikit-learn to build some unsupervised and supervised models for the data. For example, we can cluster the crimes based on their location and other attributes to find interesting patterns or hotspots.
FasTensor is a parallel execution engine for user-defined functions on multidimensional arrays. The user-defined functions follow the stencil metaphor used for scientific computing and is effective for expressing a wide range of computations for data analyses, including common aggregation operations from database management systems and advanced machine learning pipelines. FasTensor execution engine exploits the structural-locality in the multidimensional arrays to automate data management operations such as file I/O, data partitioning, communication, parallel execution, and so on.
Data Management
, Analytics
Mentor: John Wu, Bin Dong, Suren Byna
Polyphorm is an agent-based system for reconstructing and visualizing optimal transport networks defined over sparse data. Rooted in astronomy and inspired by nature, we have used Polyphorm to reconstruct the Cosmic web structure, but also to discover network-like patterns in natural language data. You can find more details about our research here. Under the hood, Polyphorm uses a richer 3D scalar field representation of the reconstructed network, instead of a discrete representation like a graph or a mesh.
PolyPhy will be a Python-based redesigned version of Polyphorm, currently in the beginning of its development cycle. PolyPhy will be a multi-platform toolkit meant for a wide audience across different disciplines: astronomers, neuroscientists, data scientists and even artists and designers. All of the offered projects focus on PolyPhy, with a variety of topics including design, coding, and even research. Ultimately, PolyPhy will become a tool for discovering connections between different disciplines by creating quantitatively comparable structural analytics.
Web Development
Dynamic Updates
UX
Develop a clean and welcoming website for the project. The organization needs to reflect the needs of PolyPhy users, but also provide a convenient entry point for interested project contributors. No excessive pop-ups or webjunk.
Specific tasks:
Design
Art
UX
Develop visual content for the project using its main themes: nature-inspired computation, biomimetics, interconnected structures. Aid in designing visual structure of the website as well as other public-facing artifacts.
Specific tasks:
Writing
Documentation
Storytelling
Integral to PolyPhy’s presentation is a story that the users and the project contributors can relate to. The objective is to develop the verbal part of that story, as well as major portions of technical documentation that matches it. The difficulty of the project is scalable.
Specific tasks:
Video Presentation
Tutorials
Didactics
Create a public face for PolyPhy that reflects its history, context, and teaches its functionality to users in different degrees of familiarity.
Specific tasks:
I/O Operations
File Conversion
Numerics
Testing
By default, PolyPhy operates with an unordered set of points as an input and scalar fields (float ndarrays) as an output, but others are applicable as well. Design and implement interfaces to load and export different data formats (CSV, OBJ, HDF5, FITS…) and modalities (points, meshes, density fields). The difficulty of the project can be scaled based on contributor’s interest.
Specific tasks:
Continuous Integration
Continuous Deployment
DevOps
The objective is to setup a CI/CD pipeline that automates the build testing and deployment of the software. The resulting process needs to be robust to contributor errors and work in the distributed conditions of a diverse contributor base.
Specific tasks:
UI/UX
Visual Experience
The key feature of PolyPhy is its interactivity. By interacting with the underlying simulation model, the user can adjust its parameters in real time and respond to its behavior. For instance, an astrophysics expert can load a dataset of 100k galaxies and reconstruct the large-scale structure of the intergalactic medium. A responsive UI combined with real-time visualization allows them to judge the fidelity of the reconstruction and make necessary changes.
Specific tasks:
Interactive Visualization
Data Analytics
3D Rendering
Data visualization is one of the core components of PolyPhy, as it provides a real-time overview of the underlying MCPM simulation. Through the feedback provided by the visualization, PolyPhy users can adjust the simulation model and make new findings about the dataset. Various operations over the reconstructed data (e.g. spatial searching) as well as important statistical summaries also benefit from clear visual presentation.
Specific tasks:
Graph Theory
Data Science
Develop a custom method for graph extraction from scalar field data produced by PolyPhy. Because PolyPhy typically produces network-like structures, representing these structures as weighted discrete graphs is very useful for efficiently navigating the data. The most important property of this abstracted representation is that it preserves the topology of the base scalar field by navigating the 1D ridges of the scalar field.
Specific tasks:
DirtViz is a project to visualize data collected from sensors deployed in sensor networks. We have deployed a number of sensors measuring qualities like soil moisture, temperature, current and voltage in outdoor settings. This project involves extending (or replacing) our existing plotting scripts to create a fully-feledged dataviz tool tailored to the types of data collected from embedded systems sensor networks.
Data Visualization
, Analytics
Mentor: Colleen Josephson
OpenRAM is an award winning open-source Python framework to create the layout, netlists, timing and power models, placement and routing models, and other views necessary to use SRAMs in ASIC design. OpenRAM supports integration in both commercial and open-source flows with both predictive and fabricable technologies. Most recently, it has created memories that are included on all of the eFabless/Google/Skywater MPW tape-outs.
User Interfaces
, Python APIs
Replace the custom logging framework in OpenRAM with Python logging module. New logging should allow levels of detail as well as tags to enable/disable logging of particular features to aid debugging.
VLSI Design Basics
, Memories
, Python
Use the OpenRAM API to generate a Read-Only Memory (ROM) file from an input hex file. Project will automatically generate a Spice netlist, layout, Verilog model and timing characterization.
VLSI Design Basics
, Memories
, Python
Use the OpenRAM API to generate a Register File from standard library cells. Project will automatically generate a Spice netlist, layout, Verilog model and timing characterization.
VLSI Design Basics
, Python
, Verilog
, Testing
Finish integration of parameterized Verilog modeule to support Built-In-Self-Test and Repair of OpenRAM memories using spare rows and columns in OpenRAM memories.
VLSI Design Basics
, Python
Create a visualization interface to debug layout verses schematic mismatches in Magic layout editor. Results will be parsed from a JSON output of Netgen.
This project aims at designing efficient, adaptive link level load balancers for networks that handle different kinds of traffic, in particular networks where flows are heterogeneous in terms of their round trip times. Geo distributed data centers are one such example. With the large-scale deployments of 5G in the near future, there will be even more applications, including more bulk transfers of videos and photos, augmented reality applications and virtual reality applications which take advantage of 5G’s low latency service. With the development and discussion about Web3.0 and Metaverse, the network workloads across data centers are only going to get more varied and challenging. All these add to heavy, bulk of data being sent to the data centers and over the backbone network. These traffic have varying quality of service requirements, like low latency, high throughput and high definition video streaming. Wide area network (WAN) flows are typically data heavy tasks that consist of backup data taken for a particular data center. The interaction of the data center and WAN traffic creates a very interesting scenario with its own challenges to be addressed. WAN and data center traffic are characterized by differences in the link utilizations and round trip times. Based on readings and literature review, there seems to be very little work on load balancers that address the interaction of data center and WAN traffic. This in turn motivates the need for designing load balancers that take into account both WAN and data center traffic in order to create high performance for more realistic scenarios. This work proposes a load balancer that is adaptive to the kind of traffic it encounters by learning from the network conditions and then predicting the optimal route for a given flow.
Through this research we seek to contribute the following :
Specific tasks: