Videoconferencing (VC) is now well-established as a tool to foster wider communication, provide a richer meeting experience than teleconferencing and to help manage costs and OH&S by reducing travel. This document uses the term videoconferencing (VC) to refer to traditional standards-based, web conferencing and cloud-based meeting solutions, which often make up part of a unified communications suite/package.
From its beginnings utilising near-broadcast production equipment and dedicated links, VC is widely deployed in one of several forms:
Dedicated, high-performance telepresence suites
Appliance-based systems for learning, teaching and meeting spaces
Software client or web-based systems which may be available across many platforms, available on resident computers, user laptops and mobile devices
All of these approaches would ideally be closely integrated with an organisation’s unified communications platform.
The primary benefit of VC over teleconferencing is that those at the far-end of the call feel more involved in the meeting by being able to discern non-verbal cues and share presentations.
Traditional room-based VC is now a reasonably mature product and widely used for meeting and boardroom spaces as well as learning and teaching environments.
Failure to satisfy far-end participants led to the development of Telepresence in the 1990s and its subsequent commercialisation and uptake in the mid-2000s. Whilst uncommon in higher education, telepresence is deployed in major government and executive corporate sites. Some systems use the same primary hardware as room-based systems, but with heavily customised software and interfaces
At the other end of the scale, web and client-based applications (‘soft codecs’) have commodotised videoconferencing to the desktop and to mobile devices. These are primarily software applications and typically leverage consumer-grade cameras and accessories.
As software-based VC gains popularity in higher education and the corporate world, major manufacturers are now offering higher quality microphones and cameras to overcome the limitations of webcams and computer microphones. Care should be taken during design to select devices which support the acquisition of video and audio at sufficient quality for an acceptable far-end experience.
Whilst there are no national or international standards which clearly define VC as a whole, the standardisation work of several bodies is influential in the evolution and implementation of VC.
VC has traditionally been influenced by ITU-T, the standardisation sector of the International Telecommunications Union. Their recommendations form the basis of many room-based systems and include:
encoding of voice and video (G.711, H.265 etc)
directory services (X.500 series)
The Internet Engineering Task Force (IETF) is also prominent in VC with SIP codified in their RFC3261 and a number of IETF RFC (Request for Comments) also underpin other technologies such as WebRTC.
Conversely, the desktop/mobile VC sector is dominated by proprietary standards (Microsoft Skype and Teams, Zoom, Cisco Jabber and WebEx, Apple FaceTime, etc) with each platform intended to meet the needs of that manufacturer’s customer base. Interoperability between these platforms and with appliance-based systems cannot be guaranteed and is often dependent on commercial arrangements between manufacturers.
Whilst there are some third-party solutions offering transcoding between multiple formats, native interoperability is only just beginning to be offered on a wider scale. As a general rule, transcoding is often limited to delivery of audio and video. Sharing of presentation content can be problematic, whilst encryption and extended user features are often unavailable.
Always test the common features you require between solutions, and wherever possible encourage meeting participants to use the same platform.
The higher education community in Australia and New Zealand is well-served with high-quality networks, managed by people who now understand the challenges of real-time video and audio communications.
A detailed examination of networking for VC is outside the scope of this document. When working on VC deployments it is necessary to work closely with the network team to best understand the institution’s infrastructure and any configuration requirements.
Where software solutions are used it is also important to consider how the software environment will be deployed and managed to deliver a quality experience to the user.
Whilst video is important, it cannot be overstated how important audio is to a successful videoconference session. A lack of intelligible audio in a conference call is the most common impediment to communication between meeting participants.
Close attention to pickup and processing of audio is essential to a good far-end experience.
As a general rule, to maximise signal-to-noise ratio (SNR) any microphone should be placed in close proximity to the talker’s mouth, but this is impractical in most VC spaces. Instead, designers must evaluate the needs of users, the architectural limitations of each space and the characteristics of each microphone type when positioning microphones. The following guidelines need to be considered:
Microphone locations should be fixed. The average user will position a movable microphone to suit themselves, rather than considering the laws of physics.
Traditional ceiling-mounted microphones are often located in acoustically challenging ‘airspace’, with noise from mechanical services and projectors impacting intelligibility and SNR. Be aware of the services in the celing grid.
Microphone physical locations must be designed to ensure that the polar (pick-up) patterns have the appropriate coverage to capture the audience, including when they face the screen(s).
Well placed table-mounted microphones can improve the quality of audio capture and often negate the impacts of mechanical services noise but have more practical limitations:
they are more prone to table noise e.g. movement of papers and typing;
visually discreet microphones may be partially or completely obscured if papers or folders are laid on top;
fixed table microphones may reduce some of the table area for general use and other applications.
When selecting microphones, the designer must take into account the limitations and operational requirements for each space, the institution’s preference for standardised platforms and their typical budget for each type of space. As with most design processes, a pragmatic approach is required and criteria may be weighted differently depending on the most common use (learning and teaching versus meetings).
A range of microphone types is available to a system designer:
Desk mounted boundary
- Visually discreet
- High gain at the boundary
- Placement often at the whim of the user
- Prone to table noise
Ceiling mounted boundary
- High gain at the intersecting boundaries
- Good performance
- Visually prominent
- Ceiling and desk-mounted options
- Improved rejection of background noise
- Achieving optimal SNR
- Generally low visual impact
- Types with onboard DSP require less configuration
- Types with external DSP require experienced personnel to configure/program DSP environment
- Multiple elements and processing combine to define target area(s)
- Usually several mounting options including visually discreet flush-mounting
- Some are proprietary and not interoperable with other manufacturer’s DSP product
- Can be difficult to coordinate with ceiling services
- Multiple elements and processing combine to target and track loudest talker
- Usually several mounting options including visually discreet flush-mounting
- Some are proprietary and not interoperable with other manufacturer’s DSP product
- Can be difficult to coordinate with ceiling services
Signal-to-noise ratio (SNR) decreases with every additional microphone opened, so designers using multiple traditional microphones should employ a well-tuned automixer for signal management. Though onboard or standalone auto-mixer products are available many organisations will process audio on their standard DSP platform.
To reduce background noise, VC systems typically use gating automixers; i.e. those microphones not receiving a predetermined audio level (threshold) are removed from the mix. It is usual to configure one microphone to always be open (usually the last active mic) so the room audio is continually sampled and more natural audio is provided to the far end.
Care must be taken to ensure that gating does not prevent the desired audio from being present in conferencing and recording feeds as well as hearing augmentation (where applicable).
When microphones are also providing in-room speech reinforcement, gating mixers usually allow nomination of the Number of Open Mics (NOM) to aid with both the noise floor and gain before feedback.
Critical to any VC audio chain is the implementation of Acoustic Echo Cancellation (AEC). Put simply, this process samples the incoming audio (amplified in the room) and removes it from the audio picked up by the microphones. Without AEC, the far end is likely to hear their own audio returned to them, delayed sufficiently to cause great annoyance and obscure the desirable audio. This echo repeats until microphones are muted.
AEC may be implemented in the codec (hardware VC or software application), external DSP or within an audio device itself. It is important that only one instance of AEC is engaged in each end-point’s signal path otherwise a self-referencing may occur which destructively nullifies the audio signal.
Large rooms with multiple audio zones can benefit from the definition of multiple corresponding AEC zones, with microphones referenced to their local speakers. This approach typically requires the use of an external DSP and helps to clean up the audio signal, which can have the added benefit of improving a mix-minus system’s performance.
Good audio is absolutely critical for successful videoconferencing. However, good quality video massively enhances the communication experience by enabling non-verbal and visual cues, often critical for the smooth running and etiquette of a meeting between participants.
Modern videoconferencing solutions allow the sharing of video from cameras, content, or a simultaneous combination of both.
There is rarely a one-size-fits-all camera for all VC implementations, and typical approaches are listed below.
A telepresence system typically uses proprietary camera modules which frame a preset field of view (typically two persons), often matched with a large display for maintaining eye contact with far-end participants. Multiple instances of these camera/display couplings are often included as a larger room-based package and their use is most common in corporate settings where high-quality face-to-face communications are required.
Telepresence is uncommon in the education sector because to gain any appreciable value from this technology, all endpoints must be set up as a telepresence suite.
PTZ Cameras in higher education applications have traditionally been selected from a manufacturer’s professional AV range. Nowadays the use of semi-professional or prosumer is becoming more prevalent, as is the use of appropriate-quality security cameras. Furthermore, the availability of room-based soft codec options has led to a rise in the use of USB cameras connected directly into computers. Regardless of the type, designers should select cameras for room systems based on the required functionality:
Suitable performance in expected lighting conditions (some applications require an improved ability to handle wider contrast ranges or automatic colour balancing)
Signal interfaces supported by the preferred routing, processing systems and endpoints
The API provided for control
Optical zoom range
Site-specific factors including mounting height/orientation and required pan/tilt/zoom range
Proprietary pan/tilt/zoom (PTZ) cameras can offer improved integration with a hardware codec of the same brand. Designers should be aware that there is sometimes a requirement to use proprietary cabling, however third-party compatible extension products are now widely available.
Professional AV cameras are typically selected for larger spaces (e.g. auditoria and lecture theatres) or where the video is shared between a variety of functions (e.g. VC, capture, overflow etc). Typically these provide HDMI or SDI video – easily extended and routed via an institution’s preferred hardware, with serial or network control.
Network cameras using traditional streaming (e.g. RTSP, H.264), or specialist protocols such as Newtek NDI are gaining popularity but require signal conversion in order to be used with VC codecs (TCP/IP network to video) or for connection to a computer (TCP/IP network to USB).
USB PTZ cameras are a popular choice in soft codec-based room systems. The image quality available from these cameras has improved greatly.
USB 3.0 and USB-C devices provide the opportunity for higher bandwidth and video quality
Control of the camera PTZ and presets can often be supported by the soft codec.
Specialist PTZ camera systems which can augment room system functionality and may offer:
automatic framing of presenter(s) image via voice triangulation;
individual presenter tracking via video or wearable device for larger spaces;
control using video analytics and manipulation via computer processing.
Cameras built into all-in-one appliances, personal technology devices or laptops. These may not be well positioned for a well-framed image but offer great flexibility for users to join calls from their desk or anywhere there is a network connection. These solutions are typically suited for individuals.
Commodity USB webcams, often selected and positioned by individual users can improve the far-end experience by lifting the camera to near eye level. These are designed for personal use, and typically struggle to perform under poor lighting conditions.
Specialist external USB cameras are targeted at improving the far-end experience in environments such as huddle rooms by software processing of multiple individual camera elements to deliver better image quality, via:
software-based PTZ – a single smaller image cut from the entire field of view (e.g. a 4K fixed USB camera can present 1080P video when zoomed in on an area equating to a quarter of the image size);
ultra-wide field of view – often required in huddle rooms where participants are gathered tightly around the screen;
analytic technology that frames the shot to include only the area containing people, rather than the entire field of view.
When locating any video camera for production, capture or VC the viewer’s perspective should be prioritised over installation convenience. Designers should remember the video is only useful where it adds to the experience; if the non-verbal cues and gestures are not communicated, then why have a camera?
Designers should consider:
The placement of cameras can accommodate a sense of visual connection, so be mindful to ensure that the design supports direct eye contact between local and far-end meeting participants.
Nobody likes being spoken down to, but that’s how it feels when the camera is below the presenter’s eyeline. Similarly, if the camera is too high the viewing audience can feel disconnected.
AVIXA standards recommend camera lens placement be no further than 15° from the eyeline of the presenter in the vertical plane.
Position the presenter camera for a reasonable head-on shot without putting the subject too far in profile.
Consider whether you need to mount a camera above, below, or between the displays for a more natural eye-line. Designers need to find the best compromise between optimal eyeline and display mounting height, and this may change depending on the primary functionality of the room (e.g. face-to-face meetings vs. digital whiteboarding):
Cameras mounted high or above the display may acquire unflattering images of nearest participants, often resulting in shots that look at the top of participant’s heads.
Cameras mounted too low may provide a view of under a meeting room table, or result in an unflattering shot when used for digital whiteboarding.
Dual displays often allow the camera to be mounted optimally in the vertical plane by creating a small gap between the two displays, sufficient for a PTZ camera to be mounted.
Cameras should be positioned to avoid capturing a display system as people will appear in silhouette due to under-exposure caused by the brightness of the screen.
Ensure cameras are mounted to avoid all obstructions.
Most people don’t like seeing themselves on screen, so correct framing of their video image is often problematic. Poorly framed shots – usually the camera zoomed to its widest – offer little hope of really discerning those important non-verbal cues and gestures.
It’s not realistic to expect the average person to want to learn what makes a good shot, or to want to set up presets for each meeting and continually switch between them. So how do we improve things?
Education helps. Most institutions will have resources available for new and continuing staff; these should help communicate the ‘effective VC how-to’ message in easily digestible language without talking down to the user.
Instead of talking about only having two or three people in shot, framing mid-shots and medium close-ups or where the eyeline should be in the frame, it makes more sense to show an example of a TV news desk.
A photo of a presenter in silhouette because the camera is exposing on the room’s projection screen or an external window may be more effective than trying to describe the issue.
For those using room systems for small meeting rooms, consider setting a few presets based on the number of attendees. For example, in a six person space:
preset 1 may be the two seats at the head of the table;
preset 2 would be slightly wider to include the next two seats;
preset 3 would capture all six seats.
Soft codecs and some fixed-lens cameras suitable for smaller spaces have options for automatic framing or automatic shot selection to improve the user experience with minimal interaction.
A larger space can be more complicated, and there are a variety of approaches available to automate the framing process. Automatic camera tracking can be achieved by:
camera systems with face and motion recognition;
wearable technology for camera tracking;
active microphone triangulation.
Control and DSP systems can be utilised for triggering presets using:
audio input thresholds;
mechanical push-buttons in predefined locations;
pressure mats under floor coverings.
The best outcomes are achieved when architects, services engineers and technology designers collaborate, with a clear understanding of the desired functionality of a space.
Certain features of the architect’s vision and designs will have impressed the client and won them the job, so will need to form the basis for the final design. It is necessary for all designers to cooperate and achieve pragmatic outcomes that deliver what the client needs, but the aim is to also achieve what the customer wants.
Large organisations generally develop standards for new facilities and these must meet both building code regulations and the organisation’s branding, whilst having a long service life and being affordable to operate.
Technical managers should ensure their published AV and VC requirements are comprehensive, and regularly updated to allow sufficient freedom to both architectural and AV designers to innovate and accommodate changes in technology.
Key considerations are acoustics and lighting, covered in more general terms elsewhere in these design guidelines. In addition, videoconferencing applications typically require a better controlled environment to achieve optimal outcomes, as outlined below.
Good audio design for VC begins with the physical space itself – highly reverberant spaces and those which are unnaturally ‘dry’ are not good candidates for VC. AETM recommends that designers and project managers work closely with their facilities management teams to formulate acoustic standards for videoconferencing spaces within the organisation.
The acoustics section of these guidelines provides an excerpt of some common room types and their associated performance criteria from the relevant standard (AS/NZS 2107). The Standard recommends that the inclusion of “audio to support video calls” be optimised by a reduction in background noise level as well as a decrease in the reverberation time of the physical space.
Include excerpt from AS2107 here - small table with typical seminar room or meeting room requirements vs. those with AV conferencing.
For this reason, it is important to communicate the presence and associated requirements of VC spaces to a project team as early as possible and prior to tendering. This is to ensure that sound isolation between spaces, HVAC system design and internal acoustic treatment are all considered from the very outset by an acoustic engineer and included within the acoustic report of the project.
In addition to lighting requirements, it is important to identify the challenges to quality video conferencing within the environment, mitigate them, and specify the appropriate technology. Some important considerations include:
Uniform colour temperature appropriate to the preferred cameras (typically 4000K) achieved via:
blockout or heavy shade blinds (preferably automatic) on windows.
Limiting the following from the camera field of view:
high visual contrast;
excess movement to allow video compression to work more efficiently;
very bright or highly reflective materials;
geometric patterns in visible elements.
Using blinds or applied films to interior and exterior glass to dramatically reduce the visibility of people or objects moving outside.
Avoiding heavily backlit or highly emissive backgrounds including windows, projection screens and other displays. Most cameras and systems we use for VC cannot resolve the contrast and presenters may end up entirely in silhouette
Architects expect to brief the lighting engineers regarding luminaires, and to coordinate with all other services for the location, style and colour of other ceiling and wall-mounted devices - and audiovisual designers should too. Other building services disciplines are competing for the same ceiling and wall space that you want, so it is imperative that AV system designers are engaged early in the concept-design phase of a room, to ensure an optimal and fully integrated outcome.
In steeply raked and other challenging spaces, camera locations should be considered in the architectural design, not just by technology designers. Aim for eyelines that are comfortable for the far-end viewer whilst avoiding or mitigating inappropriate background choices for camera shots (both described earlier in this section).
Architectural ceilings and acoustic wall panels will pose a problem if your equipment hasn’t been factored into the initial design. This is often particularly the case in feature rooms such as council/senate chambers and boardrooms.
The design team might understand why ceiling-mounted speakers and microphones are necessary, but architects generally won’t be expecting large or prominent items of equipment (such as beam-forming mics, or beam-steering speakers) to disturb the flow of their visual design. Nor might they have any awareness of the requirement to add overhead cameras, relay screens or access panels.
If the planned ceiling for a feature space includes a large area of backlit, taut fabric (a popular design choice for high-end rooms) the entire ceiling may be off limits to AV, or at least to cameras. Perhaps a different choice of acoustically neutral fabric could be used, and the equipment actually hidden from sight. This is an example where early collaboration can achieve improved results, but where the system design (and subsequent integration of the components) needs to be considered first, before the architectural design is finalised and signed off.
Always clearly explain project requirements so they describe the expected technology impacts in different space types, and ensure close coordination between teams to develop mutually beneficial outcomes.
It is important to acknowledge that users of videoconferencing venues may not always be aware of the best way to use the technology or conduct themselves in a group meeting environment. It can be helpful from a user experience perspective to include a bullet point list of simple tips and tricks to get people started somewhere in the room, such as laminated one page document that can be left on the table and passed around.
Whilst there are many guides available for individual users joining virtual meetings, often these are not dedicated to the group meeting scenario, where multiple people are in the same physical meeting space (in addition to the online attendees, who may similarly either be individuals or groups, or a combination of both).
The following points are a short list of ‘Videoconferencing Etiquette’ recommendations to get the best results out of your meeting room environments, for use in virtual meetings:
Reduce the effects of daylight by closing the blinds in the space. This will ensure that your camera has the best chance of getting a good image.
Ensure the room’s lights are turned on and bright enough to clearly see the meeting participant’s faces.
Sit in seats that face toward the camera and can be clearly seen. Traditionally you would aim to have people sit together in the same area, without big gaps between them. However, this may no longer always be good practice in light of Covid-19.
Ensure that camera framing is optimised for the number of participants in the meeting by using the available camera controls.
Aim to have a tight enough shot in order to capture facial expressions and nuances (remember - the whole aim of the video content is to reproduce the non-verbal communication for the remote participants).
Avoid making unnecessary noise that may be picked up by microphones and be distracting for other meeting participants.
Maintain eye contact wherever possible and look for physical cues to avoid speaking over other participants.
It’s easy to get distracted and important to stay focussed - treat everyone in the meeting equally, as if they were physically present.